draft-ietf-nfsv4-minorversion1-19.txt   draft-ietf-nfsv4-minorversion1-20.txt 
NFSv4 S. Shepler NFSv4 S. Shepler
Internet-Draft M. Eisler Internet-Draft M. Eisler
Intended status: Standards Track D. Noveck Intended status: Standards Track D. Noveck
Expires: August 1, 2008 Editors Expires: August 28, 2008 Editors
January 29, 2008 February 25, 2008
NFS Version 4 Minor Version 1 NFS Version 4 Minor Version 1
draft-ietf-nfsv4-minorversion1-19.txt draft-ietf-nfsv4-minorversion1-20.txt
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 35 skipping to change at page 1, line 35
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on August 1, 2008. This Internet-Draft will expire on August 28, 2008.
Copyright Notice Copyright Notice
Copyright (C) The IETF Trust (2008). Copyright (C) The IETF Trust (2008).
Abstract Abstract
This Internet-Draft describes NFS version 4 minor version one, This Internet-Draft describes NFS version 4 minor version one,
including features retained from the base protocol and protocol including features retained from the base protocol and protocol
extensions made subsequently. Major extensions introduced in NFS extensions made subsequently. Major extensions introduced in NFS
skipping to change at page 2, line 31 skipping to change at page 2, line 31
1.6.2. Protocol Structure . . . . . . . . . . . . . . . . . 15 1.6.2. Protocol Structure . . . . . . . . . . . . . . . . . 15
1.6.3. File System Model . . . . . . . . . . . . . . . . . 16 1.6.3. File System Model . . . . . . . . . . . . . . . . . 16
1.6.4. Locking Facilities . . . . . . . . . . . . . . . . . 18 1.6.4. Locking Facilities . . . . . . . . . . . . . . . . . 18
1.7. Differences from NFSv4.0 . . . . . . . . . . . . . . . . 18 1.7. Differences from NFSv4.0 . . . . . . . . . . . . . . . . 18
2. Core Infrastructure . . . . . . . . . . . . . . . . . . . . . 19 2. Core Infrastructure . . . . . . . . . . . . . . . . . . . . . 19
2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 19 2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 19
2.2. RPC and XDR . . . . . . . . . . . . . . . . . . . . . . 19 2.2. RPC and XDR . . . . . . . . . . . . . . . . . . . . . . 19
2.2.1. RPC-based Security . . . . . . . . . . . . . . . . . 19 2.2.1. RPC-based Security . . . . . . . . . . . . . . . . . 19
2.3. COMPOUND and CB_COMPOUND . . . . . . . . . . . . . . . . 22 2.3. COMPOUND and CB_COMPOUND . . . . . . . . . . . . . . . . 22
2.4. Client Identifiers and Client Owners . . . . . . . . . . 23 2.4. Client Identifiers and Client Owners . . . . . . . . . . 23
2.4.1. Upgrade from NFSv4.0 to NFSv4.1 . . . . . . . . . . 26 2.4.1. Upgrade from NFSv4.0 to NFSv4.1 . . . . . . . . . . 27
2.4.2. Server Release of Client ID . . . . . . . . . . . . 27 2.4.2. Server Release of Client ID . . . . . . . . . . . . 27
2.4.3. Resolving Client Owner Conflicts . . . . . . . . . . 27 2.4.3. Resolving Client Owner Conflicts . . . . . . . . . . 27
2.5. Server Owners . . . . . . . . . . . . . . . . . . . . . 28 2.5. Server Owners . . . . . . . . . . . . . . . . . . . . . 29
2.6. Security Service Negotiation . . . . . . . . . . . . . . 29 2.6. Security Service Negotiation . . . . . . . . . . . . . . 29
2.6.1. NFSv4.1 Security Tuples . . . . . . . . . . . . . . 29 2.6.1. NFSv4.1 Security Tuples . . . . . . . . . . . . . . 30
2.6.2. SECINFO and SECINFO_NO_NAME . . . . . . . . . . . . 29 2.6.2. SECINFO and SECINFO_NO_NAME . . . . . . . . . . . . 30
2.6.3. Security Error . . . . . . . . . . . . . . . . . . . 30 2.6.3. Security Error . . . . . . . . . . . . . . . . . . . 30
2.7. Minor Versioning . . . . . . . . . . . . . . . . . . . . 33 2.7. Minor Versioning . . . . . . . . . . . . . . . . . . . . 33
2.8. Non-RPC-based Security Services . . . . . . . . . . . . 36 2.8. Non-RPC-based Security Services . . . . . . . . . . . . 36
2.8.1. Authorization . . . . . . . . . . . . . . . . . . . 36 2.8.1. Authorization . . . . . . . . . . . . . . . . . . . 36
2.8.2. Auditing . . . . . . . . . . . . . . . . . . . . . . 36 2.8.2. Auditing . . . . . . . . . . . . . . . . . . . . . . 36
2.8.3. Intrusion Detection . . . . . . . . . . . . . . . . 36 2.8.3. Intrusion Detection . . . . . . . . . . . . . . . . 37
2.9. Transport Layers . . . . . . . . . . . . . . . . . . . . 36 2.9. Transport Layers . . . . . . . . . . . . . . . . . . . . 37
2.9.1. REQUIRED and RECOMMENDED Properties of Transports . 37 2.9.1. REQUIRED and RECOMMENDED Properties of Transports . 37
2.9.2. Client and Server Transport Behavior . . . . . . . . 37 2.9.2. Client and Server Transport Behavior . . . . . . . . 37
2.9.3. Ports . . . . . . . . . . . . . . . . . . . . . . . 39 2.9.3. Ports . . . . . . . . . . . . . . . . . . . . . . . 39
2.10. Session . . . . . . . . . . . . . . . . . . . . . . . . 39 2.10. Session . . . . . . . . . . . . . . . . . . . . . . . . 39
2.10.1. Motivation and Overview . . . . . . . . . . . . . . 39 2.10.1. Motivation and Overview . . . . . . . . . . . . . . 39
2.10.2. NFSv4 Integration . . . . . . . . . . . . . . . . . 40 2.10.2. NFSv4 Integration . . . . . . . . . . . . . . . . . 40
2.10.3. Channels . . . . . . . . . . . . . . . . . . . . . . 42 2.10.3. Channels . . . . . . . . . . . . . . . . . . . . . . 42
2.10.4. Trunking . . . . . . . . . . . . . . . . . . . . . . 43 2.10.4. Trunking . . . . . . . . . . . . . . . . . . . . . . 43
2.10.5. Exactly Once Semantics . . . . . . . . . . . . . . . 46 2.10.5. Exactly Once Semantics . . . . . . . . . . . . . . . 46
2.10.6. RDMA Considerations . . . . . . . . . . . . . . . . 58 2.10.6. RDMA Considerations . . . . . . . . . . . . . . . . 58
2.10.7. Sessions Security . . . . . . . . . . . . . . . . . 61 2.10.7. Sessions Security . . . . . . . . . . . . . . . . . 61
2.10.8. The SSV GSS Mechanism . . . . . . . . . . . . . . . 66 2.10.8. The SSV GSS Mechanism . . . . . . . . . . . . . . . 66
2.10.9. Session Mechanics - Steady State . . . . . . . . . . 70 2.10.9. Session Mechanics - Steady State . . . . . . . . . . 71
2.10.10. Session Mechanics - Recovery . . . . . . . . . . . . 71 2.10.10. Session Mechanics - Recovery . . . . . . . . . . . . 72
2.10.11. Parallel NFS and Sessions . . . . . . . . . . . . . 75 2.10.11. Parallel NFS and Sessions . . . . . . . . . . . . . 76
3. Protocol Constants and Data Types . . . . . . . . . . . . . . 75 3. Protocol Constants and Data Types . . . . . . . . . . . . . . 76
3.1. Basic Constants . . . . . . . . . . . . . . . . . . . . 75 3.1. Basic Constants . . . . . . . . . . . . . . . . . . . . 76
3.2. Basic Data Types . . . . . . . . . . . . . . . . . . . . 76 3.2. Basic Data Types . . . . . . . . . . . . . . . . . . . . 77
3.3. Structured Data Types . . . . . . . . . . . . . . . . . 78 3.3. Structured Data Types . . . . . . . . . . . . . . . . . 79
4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . . . 87 4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 87 4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 88
4.1.1. Root Filehandle . . . . . . . . . . . . . . . . . . 87 4.1.1. Root Filehandle . . . . . . . . . . . . . . . . . . 89
4.1.2. Public Filehandle . . . . . . . . . . . . . . . . . 88 4.1.2. Public Filehandle . . . . . . . . . . . . . . . . . 89
4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 88 4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 89
4.2.1. General Properties of a Filehandle . . . . . . . . . 89 4.2.1. General Properties of a Filehandle . . . . . . . . . 90
4.2.2. Persistent Filehandle . . . . . . . . . . . . . . . 89 4.2.2. Persistent Filehandle . . . . . . . . . . . . . . . 91
4.2.3. Volatile Filehandle . . . . . . . . . . . . . . . . 90 4.2.3. Volatile Filehandle . . . . . . . . . . . . . . . . 91
4.3. One Method of Constructing a Volatile Filehandle . . . . 91 4.3. One Method of Constructing a Volatile Filehandle . . . . 92
4.4. Client Recovery from Filehandle Expiration . . . . . . . 91 4.4. Client Recovery from Filehandle Expiration . . . . . . . 93
5. File Attributes . . . . . . . . . . . . . . . . . . . . . . . 92 5. File Attributes . . . . . . . . . . . . . . . . . . . . . . . 94
5.1. REQUIRED Attributes . . . . . . . . . . . . . . . . . . 94 5.1. REQUIRED Attributes . . . . . . . . . . . . . . . . . . 95
5.2. RECOMMENDED Attributes . . . . . . . . . . . . . . . . . 94 5.2. RECOMMENDED Attributes . . . . . . . . . . . . . . . . . 95
5.3. Named Attributes . . . . . . . . . . . . . . . . . . . . 94 5.3. Named Attributes . . . . . . . . . . . . . . . . . . . . 96
5.4. Classification of Attributes . . . . . . . . . . . . . . 96 5.4. Classification of Attributes . . . . . . . . . . . . . . 97
5.5. REQUIRED Attributes - List and Definition References . . 97 5.5. REQUIRED Attributes - List and Definition References . . 99
5.6. RECOMMENDED Attributes - List and Definition 5.6. RECOMMENDED Attributes - List and Definition
References . . . . . . . . . . . . . . . . . . . . . . . 97 References . . . . . . . . . . . . . . . . . . . . . . . 99
5.7. Attribute Definitions . . . . . . . . . . . . . . . . . 99 5.7. Attribute Definitions . . . . . . . . . . . . . . . . . 101
5.7.1. Definitions of REQUIRED Attributes . . . . . . . . . 99 5.7.1. Definitions of REQUIRED Attributes . . . . . . . . . 101
5.7.2. Definitions of Uncategorized RECOMMENDED 5.7.2. Definitions of Uncategorized RECOMMENDED
Attributes . . . . . . . . . . . . . . . . . . . . . 101 Attributes . . . . . . . . . . . . . . . . . . . . . 103
5.8. Interpreting owner and owner_group . . . . . . . . . . . 107 5.8. Interpreting owner and owner_group . . . . . . . . . . . 109
5.9. Character Case Attributes . . . . . . . . . . . . . . . 109 5.9. Character Case Attributes . . . . . . . . . . . . . . . 111
5.10. Directory Notification Attributes . . . . . . . . . . . 109 5.10. Directory Notification Attributes . . . . . . . . . . . 111
5.11. pNFS Attribute Definitions . . . . . . . . . . . . . . . 110 5.11. pNFS Attribute Definitions . . . . . . . . . . . . . . . 112
5.12. Retention Attributes . . . . . . . . . . . . . . . . . . 112 5.12. Retention Attributes . . . . . . . . . . . . . . . . . . 114
6. Security Related Attributes . . . . . . . . . . . . . . . . . 114 6. Security Related Attributes . . . . . . . . . . . . . . . . . 116
6.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . 114 6.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . 116
6.2. File Attributes Discussion . . . . . . . . . . . . . . . 115 6.2. File Attributes Discussion . . . . . . . . . . . . . . . 117
6.2.1. Attribute 12: acl . . . . . . . . . . . . . . . . . 115 6.2.1. Attribute 12: acl . . . . . . . . . . . . . . . . . 117
6.2.2. Attribute 58: dacl . . . . . . . . . . . . . . . . . 130 6.2.2. Attribute 58: dacl . . . . . . . . . . . . . . . . . 132
6.2.3. Attribute 59: sacl . . . . . . . . . . . . . . . . . 130 6.2.3. Attribute 59: sacl . . . . . . . . . . . . . . . . . 132
6.2.4. Attribute 33: mode . . . . . . . . . . . . . . . . . 131 6.2.4. Attribute 33: mode . . . . . . . . . . . . . . . . . 132
6.2.5. Attribute 74: mode_set_masked . . . . . . . . . . . 131 6.2.5. Attribute 74: mode_set_masked . . . . . . . . . . . 133
6.3. Common Methods . . . . . . . . . . . . . . . . . . . . . 132 6.3. Common Methods . . . . . . . . . . . . . . . . . . . . . 134
6.3.1. Interpreting an ACL . . . . . . . . . . . . . . . . 132 6.3.1. Interpreting an ACL . . . . . . . . . . . . . . . . 134
6.3.2. Computing a Mode Attribute from an ACL . . . . . . . 133 6.3.2. Computing a Mode Attribute from an ACL . . . . . . . 135
6.4. Requirements . . . . . . . . . . . . . . . . . . . . . . 134 6.4. Requirements . . . . . . . . . . . . . . . . . . . . . . 136
6.4.1. Setting the mode and/or ACL Attributes . . . . . . . 134 6.4.1. Setting the mode and/or ACL Attributes . . . . . . . 136
6.4.2. Retrieving the mode and/or ACL Attributes . . . . . 136 6.4.2. Retrieving the mode and/or ACL Attributes . . . . . 138
6.4.3. Creating New Objects . . . . . . . . . . . . . . . . 136 6.4.3. Creating New Objects . . . . . . . . . . . . . . . . 138
7. Single-server Namespace . . . . . . . . . . . . . . . . . . . 140 7. Single-server Namespace . . . . . . . . . . . . . . . . . . . 142
7.1. Server Exports . . . . . . . . . . . . . . . . . . . . . 140 7.1. Server Exports . . . . . . . . . . . . . . . . . . . . . 142
7.2. Browsing Exports . . . . . . . . . . . . . . . . . . . . 141 7.2. Browsing Exports . . . . . . . . . . . . . . . . . . . . 143
7.3. Server Pseudo File System . . . . . . . . . . . . . . . 141 7.3. Server Pseudo File System . . . . . . . . . . . . . . . 143
7.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . 142 7.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . 144
7.5. Filehandle Volatility . . . . . . . . . . . . . . . . . 142 7.5. Filehandle Volatility . . . . . . . . . . . . . . . . . 144
7.6. Exported Root . . . . . . . . . . . . . . . . . . . . . 142 7.6. Exported Root . . . . . . . . . . . . . . . . . . . . . 144
7.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . 143 7.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . 145
7.8. Security Policy and Namespace Presentation . . . . . . . 143 7.8. Security Policy and Namespace Presentation . . . . . . . 145
8. State Management . . . . . . . . . . . . . . . . . . . . . . 144 8. State Management . . . . . . . . . . . . . . . . . . . . . . 146
8.1. Client and Session ID . . . . . . . . . . . . . . . . . 145 8.1. Client and Session ID . . . . . . . . . . . . . . . . . 147
8.2. Stateid Definition . . . . . . . . . . . . . . . . . . . 145 8.2. Stateid Definition . . . . . . . . . . . . . . . . . . . 147
8.2.1. Stateid Types . . . . . . . . . . . . . . . . . . . 146 8.2.1. Stateid Types . . . . . . . . . . . . . . . . . . . 148
8.2.2. Stateid Structure . . . . . . . . . . . . . . . . . 147 8.2.2. Stateid Structure . . . . . . . . . . . . . . . . . 149
8.2.3. Special Stateids . . . . . . . . . . . . . . . . . . 148 8.2.3. Special Stateids . . . . . . . . . . . . . . . . . . 150
8.2.4. Stateid Lifetime and Validation . . . . . . . . . . 150 8.2.4. Stateid Lifetime and Validation . . . . . . . . . . 152
8.2.5. Stateid Use for I/O Operations . . . . . . . . . . . 153 8.2.5. Stateid Use for I/O Operations . . . . . . . . . . . 155
8.3. Lease Renewal . . . . . . . . . . . . . . . . . . . . . 153 8.3. Lease Renewal . . . . . . . . . . . . . . . . . . . . . 155
8.4. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 155 8.4. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 157
8.4.1. Client Failure and Recovery . . . . . . . . . . . . 155 8.4.1. Client Failure and Recovery . . . . . . . . . . . . 158
8.4.2. Server Failure and Recovery . . . . . . . . . . . . 156 8.4.2. Server Failure and Recovery . . . . . . . . . . . . 158
8.4.3. Network Partitions and Recovery . . . . . . . . . . 159 8.4.3. Network Partitions and Recovery . . . . . . . . . . 162
8.5. Server Revocation of Locks . . . . . . . . . . . . . . . 164 8.5. Server Revocation of Locks . . . . . . . . . . . . . . . 166
8.6. Short and Long Leases . . . . . . . . . . . . . . . . . 165 8.6. Short and Long Leases . . . . . . . . . . . . . . . . . 167
8.7. Clocks, Propagation Delay, and Calculating Lease 8.7. Clocks, Propagation Delay, and Calculating Lease
Expiration . . . . . . . . . . . . . . . . . . . . . . . 165 Expiration . . . . . . . . . . . . . . . . . . . . . . . 168
8.8. Vestigial Locking Infrastructure From V4.0 . . . . . . . 166 8.8. Vestigial Locking Infrastructure From V4.0 . . . . . . . 168
9. File Locking and Share Reservations . . . . . . . . . . . . . 167 9. File Locking and Share Reservations . . . . . . . . . . . . . 169
9.1. Opens and Byte-range Locks . . . . . . . . . . . . . . . 167 9.1. Opens and Byte-range Locks . . . . . . . . . . . . . . . 170
9.1.1. State-owner Definition . . . . . . . . . . . . . . . 167 9.1.1. State-owner Definition . . . . . . . . . . . . . . . 170
9.1.2. Use of the Stateid and Locking . . . . . . . . . . . 168 9.1.2. Use of the Stateid and Locking . . . . . . . . . . . 170
9.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . 171 9.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . 173
9.3. Upgrading and Downgrading Locks . . . . . . . . . . . . 171 9.3. Upgrading and Downgrading Locks . . . . . . . . . . . . 174
9.4. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 172 9.4. Stateid Seqid Values and Byte-range Locks . . . . . . . 174
9.5. Share Reservations . . . . . . . . . . . . . . . . . . . 173 9.5. Issues with Multiple Open-owners . . . . . . . . . . . . 175
9.6. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . 174 9.6. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 175
9.7. Open Upgrade and Downgrade . . . . . . . . . . . . . . . 174 9.7. Share Reservations . . . . . . . . . . . . . . . . . . . 176
9.8. Parallel OPENs . . . . . . . . . . . . . . . . . . . . . 175 9.8. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . 177
9.9. Reclaim of Open and Byte-range Locks . . . . . . . . . . 176 9.9. Open Upgrade and Downgrade . . . . . . . . . . . . . . . 178
10. Client-Side Caching . . . . . . . . . . . . . . . . . . . . . 176 9.10. Parallel OPENs . . . . . . . . . . . . . . . . . . . . . 179
10.1. Performance Challenges for Client-Side Caching . . . . . 177 9.11. Reclaim of Open and Byte-range Locks . . . . . . . . . . 179
10.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 178 10. Client-Side Caching . . . . . . . . . . . . . . . . . . . . . 180
10.2.1. Delegation Recovery . . . . . . . . . . . . . . . . 180 10.1. Performance Challenges for Client-Side Caching . . . . . 180
10.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 182 10.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 181
10.3.1. Data Caching and OPENs . . . . . . . . . . . . . . . 182 10.2.1. Delegation Recovery . . . . . . . . . . . . . . . . 183
10.3.2. Data Caching and File Locking . . . . . . . . . . . 183 10.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 186
10.3.3. Data Caching and Mandatory File Locking . . . . . . 185 10.3.1. Data Caching and OPENs . . . . . . . . . . . . . . . 186
10.3.4. Data Caching and File Identity . . . . . . . . . . . 185 10.3.2. Data Caching and File Locking . . . . . . . . . . . 187
10.4. Open Delegation . . . . . . . . . . . . . . . . . . . . 187 10.3.3. Data Caching and Mandatory File Locking . . . . . . 189
10.4.1. Open Delegation and Data Caching . . . . . . . . . . 189 10.3.4. Data Caching and File Identity . . . . . . . . . . . 189
10.4.2. Open Delegation and File Locks . . . . . . . . . . . 190 10.4. Open Delegation . . . . . . . . . . . . . . . . . . . . 190
10.4.3. Handling of CB_GETATTR . . . . . . . . . . . . . . . 191 10.4.1. Open Delegation and Data Caching . . . . . . . . . . 192
10.4.4. Recall of Open Delegation . . . . . . . . . . . . . 194 10.4.2. Open Delegation and File Locks . . . . . . . . . . . 194
10.4.5. Clients that Fail to Honor Delegation Recalls . . . 195 10.4.3. Handling of CB_GETATTR . . . . . . . . . . . . . . . 194
10.4.6. Delegation Revocation . . . . . . . . . . . . . . . 196 10.4.4. Recall of Open Delegation . . . . . . . . . . . . . 197
10.4.7. Delegations via WANT_DELEGATION . . . . . . . . . . 197 10.4.5. Clients that Fail to Honor Delegation Recalls . . . 199
10.5. Data Caching and Revocation . . . . . . . . . . . . . . 197 10.4.6. Delegation Revocation . . . . . . . . . . . . . . . 200
10.5.1. Revocation Recovery for Write Open Delegation . . . 198 10.4.7. Delegations via WANT_DELEGATION . . . . . . . . . . 200
10.6. Attribute Caching . . . . . . . . . . . . . . . . . . . 199 10.5. Data Caching and Revocation . . . . . . . . . . . . . . 201
10.7. Data and Metadata Caching and Memory Mapped Files . . . 201 10.5.1. Revocation Recovery for Write Open Delegation . . . 202
10.6. Attribute Caching . . . . . . . . . . . . . . . . . . . 202
10.7. Data and Metadata Caching and Memory Mapped Files . . . 204
10.8. Name and Directory Caching without Directory 10.8. Name and Directory Caching without Directory
Delegations . . . . . . . . . . . . . . . . . . . . . . 203 Delegations . . . . . . . . . . . . . . . . . . . . . . 207
10.8.1. Name Caching . . . . . . . . . . . . . . . . . . . . 203 10.8.1. Name Caching . . . . . . . . . . . . . . . . . . . . 207
10.8.2. Directory Caching . . . . . . . . . . . . . . . . . 205 10.8.2. Directory Caching . . . . . . . . . . . . . . . . . 208
10.9. Directory Delegations . . . . . . . . . . . . . . . . . 205 10.9. Directory Delegations . . . . . . . . . . . . . . . . . 209
10.9.1. Introduction to Directory Delegations . . . . . . . 206 10.9.1. Introduction to Directory Delegations . . . . . . . 209
10.9.2. Directory Delegation Design . . . . . . . . . . . . 207 10.9.2. Directory Delegation Design . . . . . . . . . . . . 210
10.9.3. Attributes in Support of Directory Notifications . . 208 10.9.3. Attributes in Support of Directory Notifications . . 211
10.9.4. Directory Delegation Recall . . . . . . . . . . . . 208 10.9.4. Directory Delegation Recall . . . . . . . . . . . . 211
10.9.5. Directory Delegation Recovery . . . . . . . . . . . 208 10.9.5. Directory Delegation Recovery . . . . . . . . . . . 212
11. Multi-Server Namespace . . . . . . . . . . . . . . . . . . . 209 11. Multi-Server Namespace . . . . . . . . . . . . . . . . . . . 212
11.1. Location Attributes . . . . . . . . . . . . . . . . . . 209 11.1. Location Attributes . . . . . . . . . . . . . . . . . . 213
11.2. File System Presence or Absence . . . . . . . . . . . . 209 11.2. File System Presence or Absence . . . . . . . . . . . . 213
11.3. Getting Attributes for an Absent File System . . . . . . 211 11.3. Getting Attributes for an Absent File System . . . . . . 214
11.3.1. GETATTR Within an Absent File System . . . . . . . . 211 11.3.1. GETATTR Within an Absent File System . . . . . . . . 214
11.3.2. READDIR and Absent File Systems . . . . . . . . . . 212 11.3.2. READDIR and Absent File Systems . . . . . . . . . . 216
11.4. Uses of Location Information . . . . . . . . . . . . . . 213 11.4. Uses of Location Information . . . . . . . . . . . . . . 216
11.4.1. File System Replication . . . . . . . . . . . . . . 213 11.4.1. File System Replication . . . . . . . . . . . . . . 217
11.4.2. File System Migration . . . . . . . . . . . . . . . 214 11.4.2. File System Migration . . . . . . . . . . . . . . . 218
11.4.3. Referrals . . . . . . . . . . . . . . . . . . . . . 215 11.4.3. Referrals . . . . . . . . . . . . . . . . . . . . . 219
11.5. Location Entries and Server Identity . . . . . . . . . . 217 11.5. Location Entries and Server Identity . . . . . . . . . . 220
11.6. Additional Client-side Considerations . . . . . . . . . 217 11.6. Additional Client-side Considerations . . . . . . . . . 221
11.7. Effecting File System Transitions . . . . . . . . . . . 218 11.7. Effecting File System Transitions . . . . . . . . . . . 222
11.7.1. File System Transitions and Simultaneous Access . . 219 11.7.1. File System Transitions and Simultaneous Access . . 223
11.7.2. Simultaneous Use and Transparent Transitions . . . . 220 11.7.2. Simultaneous Use and Transparent Transitions . . . . 224
11.7.3. Filehandles and File System Transitions . . . . . . 223 11.7.3. Filehandles and File System Transitions . . . . . . 226
11.7.4. Fileids and File System Transitions . . . . . . . . 223 11.7.4. Fileids and File System Transitions . . . . . . . . 227
11.7.5. Fsids and File System Transitions . . . . . . . . . 224 11.7.5. Fsids and File System Transitions . . . . . . . . . 228
11.7.6. The Change Attribute and File System Transitions . . 225 11.7.6. The Change Attribute and File System Transitions . . 229
11.7.7. Lock State and File System Transitions . . . . . . . 226 11.7.7. Lock State and File System Transitions . . . . . . . 229
11.7.8. Write Verifiers and File System Transitions . . . . 229 11.7.8. Write Verifiers and File System Transitions . . . . 233
11.7.9. Readdir Cookies and Verifiers and File System 11.7.9. Readdir Cookies and Verifiers and File System
Transitions . . . . . . . . . . . . . . . . . . . . 230 Transitions . . . . . . . . . . . . . . . . . . . . 233
11.7.10. File System Data and File System Transitions . . . . 230 11.7.10. File System Data and File System Transitions . . . . 234
11.8. Effecting File System Referrals . . . . . . . . . . . . 231 11.8. Effecting File System Referrals . . . . . . . . . . . . 235
11.8.1. Referral Example (LOOKUP) . . . . . . . . . . . . . 232 11.8.1. Referral Example (LOOKUP) . . . . . . . . . . . . . 235
11.8.2. Referral Example (READDIR) . . . . . . . . . . . . . 236 11.8.2. Referral Example (READDIR) . . . . . . . . . . . . . 239
11.9. The Attribute fs_locations . . . . . . . . . . . . . . . 238 11.9. The Attribute fs_locations . . . . . . . . . . . . . . . 242
11.10. The Attribute fs_locations_info . . . . . . . . . . . . 240 11.10. The Attribute fs_locations_info . . . . . . . . . . . . 244
11.10.1. The fs_locations_server4 Structure . . . . . . . . . 244 11.10.1. The fs_locations_server4 Structure . . . . . . . . . 247
11.10.2. The fs_locations_info4 Structure . . . . . . . . . . 249 11.10.2. The fs_locations_info4 Structure . . . . . . . . . . 253
11.10.3. The fs_locations_item4 Structure . . . . . . . . . . 250 11.10.3. The fs_locations_item4 Structure . . . . . . . . . . 254
11.11. The Attribute fs_status . . . . . . . . . . . . . . . . 252 11.11. The Attribute fs_status . . . . . . . . . . . . . . . . 256
12. Parallel NFS (pNFS) . . . . . . . . . . . . . . . . . . . . . 256 12. Parallel NFS (pNFS) . . . . . . . . . . . . . . . . . . . . . 259
12.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 256 12.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 259
12.2. pNFS Definitions . . . . . . . . . . . . . . . . . . . . 257 12.2. pNFS Definitions . . . . . . . . . . . . . . . . . . . . 261
12.2.1. Metadata . . . . . . . . . . . . . . . . . . . . . . 258 12.2.1. Metadata . . . . . . . . . . . . . . . . . . . . . . 261
12.2.2. Metadata Server . . . . . . . . . . . . . . . . . . 258 12.2.2. Metadata Server . . . . . . . . . . . . . . . . . . 261
12.2.3. pNFS Client . . . . . . . . . . . . . . . . . . . . 258 12.2.3. pNFS Client . . . . . . . . . . . . . . . . . . . . 262
12.2.4. Storage Device . . . . . . . . . . . . . . . . . . . 258 12.2.4. Storage Device . . . . . . . . . . . . . . . . . . . 262
12.2.5. Storage Protocol . . . . . . . . . . . . . . . . . . 258 12.2.5. Storage Protocol . . . . . . . . . . . . . . . . . . 262
12.2.6. Control Protocol . . . . . . . . . . . . . . . . . . 258 12.2.6. Control Protocol . . . . . . . . . . . . . . . . . . 262
12.2.7. Layout Types . . . . . . . . . . . . . . . . . . . . 259 12.2.7. Layout Types . . . . . . . . . . . . . . . . . . . . 262
12.2.8. Layout . . . . . . . . . . . . . . . . . . . . . . . 259 12.2.8. Layout . . . . . . . . . . . . . . . . . . . . . . . 263
12.2.9. Layout Iomode . . . . . . . . . . . . . . . . . . . 260 12.2.9. Layout Iomode . . . . . . . . . . . . . . . . . . . 263
12.2.10. Device IDs . . . . . . . . . . . . . . . . . . . . . 260 12.2.10. Device IDs . . . . . . . . . . . . . . . . . . . . . 264
12.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 262 12.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 265
12.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 263 12.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 266
12.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 263 12.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 266
12.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 263 12.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 266
12.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 264 12.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 268
12.5.3. Layout Stateid . . . . . . . . . . . . . . . . . . . 265 12.5.3. Layout Stateid . . . . . . . . . . . . . . . . . . . 269
12.5.4. Committing a Layout . . . . . . . . . . . . . . . . 266 12.5.4. Committing a Layout . . . . . . . . . . . . . . . . 270
12.5.5. Recalling a Layout . . . . . . . . . . . . . . . . . 269 12.5.5. Recalling a Layout . . . . . . . . . . . . . . . . . 272
12.5.6. Revoking Layouts . . . . . . . . . . . . . . . . . . 276 12.5.6. Revoking Layouts . . . . . . . . . . . . . . . . . . 279
12.5.7. Metadata Server Write Propagation . . . . . . . . . 276 12.5.7. Metadata Server Write Propagation . . . . . . . . . 279
12.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 276 12.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 280
12.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 278 12.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 281
12.7.1. Recovery from Client Restart . . . . . . . . . . . . 278 12.7.1. Recovery from Client Restart . . . . . . . . . . . . 282
12.7.2. Dealing with Lease Expiration on the Client . . . . 279 12.7.2. Dealing with Lease Expiration on the Client . . . . 282
12.7.3. Dealing with Loss of Layout State on the Metadata 12.7.3. Dealing with Loss of Layout State on the Metadata
Server . . . . . . . . . . . . . . . . . . . . . . . 280 Server . . . . . . . . . . . . . . . . . . . . . . . 283
12.7.4. Recovery from Metadata Server Restart . . . . . . . 280 12.7.4. Recovery from Metadata Server Restart . . . . . . . 284
12.7.5. Operations During Metadata Server Grace Period . . . 282 12.7.5. Operations During Metadata Server Grace Period . . . 286
12.7.6. Storage Device Recovery . . . . . . . . . . . . . . 283 12.7.6. Storage Device Recovery . . . . . . . . . . . . . . 286
12.8. Metadata and Storage Device Roles . . . . . . . . . . . 283 12.8. Metadata and Storage Device Roles . . . . . . . . . . . 286
12.9. Security Considerations for pNFS . . . . . . . . . . . . 284 12.9. Security Considerations for pNFS . . . . . . . . . . . . 287
13. PNFS: NFSv4.1 File Layout Type . . . . . . . . . . . . . . . 285 13. PNFS: NFSv4.1 File Layout Type . . . . . . . . . . . . . . . 288
13.1. Client ID and Session Considerations . . . . . . . . . . 285 13.1. Client ID and Session Considerations . . . . . . . . . . 288
13.2. File Layout Definitions . . . . . . . . . . . . . . . . 287 13.2. File Layout Definitions . . . . . . . . . . . . . . . . 290
13.3. File Layout Data Types . . . . . . . . . . . . . . . . . 288 13.3. File Layout Data Types . . . . . . . . . . . . . . . . . 291
13.4. Interpreting the File Layout . . . . . . . . . . . . . . 292 13.4. Interpreting the File Layout . . . . . . . . . . . . . . 295
13.4.1. Determining the Stripe Unit Number . . . . . . . . . 292 13.4.1. Determining the Stripe Unit Number . . . . . . . . . 295
13.4.2. Interpreting the File Layout Using Sparse Packing . 292 13.4.2. Interpreting the File Layout Using Sparse Packing . 295
13.4.3. Interpreting the File Layout Using Dense Packing . . 294 13.4.3. Interpreting the File Layout Using Dense Packing . . 298
13.4.4. Sparse and Dense Stripe Unit Packing . . . . . . . . 297 13.4.4. Sparse and Dense Stripe Unit Packing . . . . . . . . 300
13.5. Data Server Multipathing . . . . . . . . . . . . . . . . 298 13.5. Data Server Multipathing . . . . . . . . . . . . . . . . 302
13.6. Operations Sent to NFSv4.1 Data Servers . . . . . . . . 299 13.6. Operations Sent to NFSv4.1 Data Servers . . . . . . . . 303
13.7. COMMIT Through Metadata Server . . . . . . . . . . . . . 302 13.7. COMMIT Through Metadata Server . . . . . . . . . . . . . 305
13.8. The Layout Iomode . . . . . . . . . . . . . . . . . . . 303 13.8. The Layout Iomode . . . . . . . . . . . . . . . . . . . 307
13.9. Metadata and Data Server State Coordination . . . . . . 303 13.9. Metadata and Data Server State Coordination . . . . . . 307
13.9.1. Global Stateid Requirements . . . . . . . . . . . . 303 13.9.1. Global Stateid Requirements . . . . . . . . . . . . 307
13.9.2. Data Server State Propagation . . . . . . . . . . . 304 13.9.2. Data Server State Propagation . . . . . . . . . . . 308
13.10. Data Server Component File Size . . . . . . . . . . . . 306 13.10. Data Server Component File Size . . . . . . . . . . . . 310
13.11. Layout Revocation and Fencing . . . . . . . . . . . . . 307 13.11. Layout Revocation and Fencing . . . . . . . . . . . . . 311
13.12. Security Considerations for the File Layout Type . . . . 308 13.12. Security Considerations for the File Layout Type . . . . 311
14. Internationalization . . . . . . . . . . . . . . . . . . . . 309 14. Internationalization . . . . . . . . . . . . . . . . . . . . 312
14.1. Stringprep profile for the utf8str_cs type . . . . . . . 310 14.1. Stringprep profile for the utf8str_cs type . . . . . . . 313
14.2. Stringprep profile for the utf8str_cis type . . . . . . 311 14.2. Stringprep profile for the utf8str_cis type . . . . . . 315
14.3. Stringprep profile for the utf8str_mixed type . . . . . 313 14.3. Stringprep profile for the utf8str_mixed type . . . . . 316
14.4. UTF-8 Capabilities . . . . . . . . . . . . . . . . . . . 314 14.4. UTF-8 Capabilities . . . . . . . . . . . . . . . . . . . 318
14.5. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 314 14.5. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 318
15. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 315 15. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 319
15.1. Error Definitions . . . . . . . . . . . . . . . . . . . 315 15.1. Error Definitions . . . . . . . . . . . . . . . . . . . 319
15.1.1. General Errors . . . . . . . . . . . . . . . . . . . 317 15.1.1. General Errors . . . . . . . . . . . . . . . . . . . 321
15.1.2. Filehandle Errors . . . . . . . . . . . . . . . . . 319 15.1.2. Filehandle Errors . . . . . . . . . . . . . . . . . 323
15.1.3. Compound Structure Errors . . . . . . . . . . . . . 320 15.1.3. Compound Structure Errors . . . . . . . . . . . . . 324
15.1.4. File System Errors . . . . . . . . . . . . . . . . . 322 15.1.4. File System Errors . . . . . . . . . . . . . . . . . 326
15.1.5. State Management Errors . . . . . . . . . . . . . . 324 15.1.5. State Management Errors . . . . . . . . . . . . . . 328
15.1.6. Security Errors . . . . . . . . . . . . . . . . . . 325 15.1.6. Security Errors . . . . . . . . . . . . . . . . . . 329
15.1.7. Name Errors . . . . . . . . . . . . . . . . . . . . 326 15.1.7. Name Errors . . . . . . . . . . . . . . . . . . . . 329
15.1.8. Locking Errors . . . . . . . . . . . . . . . . . . . 326 15.1.8. Locking Errors . . . . . . . . . . . . . . . . . . . 330
15.1.9. Reclaim Errors . . . . . . . . . . . . . . . . . . . 328 15.1.9. Reclaim Errors . . . . . . . . . . . . . . . . . . . 331
15.1.10. pNFS Errors . . . . . . . . . . . . . . . . . . . . 328 15.1.10. pNFS Errors . . . . . . . . . . . . . . . . . . . . 332
15.1.11. Session Use Errors . . . . . . . . . . . . . . . . . 330 15.1.11. Session Use Errors . . . . . . . . . . . . . . . . . 333
15.1.12. Session Management Errors . . . . . . . . . . . . . 331 15.1.12. Session Management Errors . . . . . . . . . . . . . 334
15.1.13. Client Management Errors . . . . . . . . . . . . . . 331 15.1.13. Client Management Errors . . . . . . . . . . . . . . 335
15.1.14. Delegation Errors . . . . . . . . . . . . . . . . . 332 15.1.14. Delegation Errors . . . . . . . . . . . . . . . . . 336
15.1.15. Attribute Handling Errors . . . . . . . . . . . . . 333 15.1.15. Attribute Handling Errors . . . . . . . . . . . . . 336
15.1.16. Obsoleted Errors . . . . . . . . . . . . . . . . . . 333 15.1.16. Obsoleted Errors . . . . . . . . . . . . . . . . . . 337
15.2. Operations and their valid errors . . . . . . . . . . . 334 15.2. Operations and their valid errors . . . . . . . . . . . 338
15.3. Callback operations and their valid errors . . . . . . . 350 15.3. Callback operations and their valid errors . . . . . . . 354
15.4. Errors and the operations that use them . . . . . . . . 352 15.4. Errors and the operations that use them . . . . . . . . 356
16. NFSv4.1 Procedures . . . . . . . . . . . . . . . . . . . . . 366 16. NFSv4.1 Procedures . . . . . . . . . . . . . . . . . . . . . 370
16.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 366 16.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 370
16.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 367 16.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 371
17. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . 377 17. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . 381
18. NFSv4.1 Operations . . . . . . . . . . . . . . . . . . . . . 380 18. NFSv4.1 Operations . . . . . . . . . . . . . . . . . . . . . 384
18.1. Operation 3: ACCESS - Check Access Rights . . . . . . . 380 18.1. Operation 3: ACCESS - Check Access Rights . . . . . . . 384
18.2. Operation 4: CLOSE - Close File . . . . . . . . . . . . 383 18.2. Operation 4: CLOSE - Close File . . . . . . . . . . . . 387
18.3. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 384 18.3. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 388
18.4. Operation 6: CREATE - Create a Non-Regular File Object . 387 18.4. Operation 6: CREATE - Create a Non-Regular File Object . 391
18.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting 18.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting
Recovery . . . . . . . . . . . . . . . . . . . . . . . . 390 Recovery . . . . . . . . . . . . . . . . . . . . . . . . 394
18.6. Operation 8: DELEGRETURN - Return Delegation . . . . . . 391 18.6. Operation 8: DELEGRETURN - Return Delegation . . . . . . 395
18.7. Operation 9: GETATTR - Get Attributes . . . . . . . . . 391 18.7. Operation 9: GETATTR - Get Attributes . . . . . . . . . 395
18.8. Operation 10: GETFH - Get Current Filehandle . . . . . . 393 18.8. Operation 10: GETFH - Get Current Filehandle . . . . . . 397
18.9. Operation 11: LINK - Create Link to a File . . . . . . . 394 18.9. Operation 11: LINK - Create Link to a File . . . . . . . 398
18.10. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 396 18.10. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 400
18.11. Operation 13: LOCKT - Test For Lock . . . . . . . . . . 400 18.11. Operation 13: LOCKT - Test For Lock . . . . . . . . . . 404
18.12. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 402 18.12. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 406
18.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 403 18.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 407
18.14. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 405 18.14. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 409
18.15. Operation 17: NVERIFY - Verify Difference in 18.15. Operation 17: NVERIFY - Verify Difference in
Attributes . . . . . . . . . . . . . . . . . . . . . . . 406 Attributes . . . . . . . . . . . . . . . . . . . . . . . 410
18.16. Operation 18: OPEN - Open a Regular File . . . . . . . . 407 18.16. Operation 18: OPEN - Open a Regular File . . . . . . . . 411
18.17. Operation 19: OPENATTR - Open Named Attribute 18.17. Operation 19: OPENATTR - Open Named Attribute
Directory . . . . . . . . . . . . . . . . . . . . . . . 426 Directory . . . . . . . . . . . . . . . . . . . . . . . 430
18.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 427 18.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 431
18.19. Operation 22: PUTFH - Set Current Filehandle . . . . . . 428 18.19. Operation 22: PUTFH - Set Current Filehandle . . . . . . 432
18.20. Operation 23: PUTPUBFH - Set Public Filehandle . . . . . 429 18.20. Operation 23: PUTPUBFH - Set Public Filehandle . . . . . 433
18.21. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 431 18.21. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 435
18.22. Operation 25: READ - Read from File . . . . . . . . . . 431 18.22. Operation 25: READ - Read from File . . . . . . . . . . 435
18.23. Operation 26: READDIR - Read Directory . . . . . . . . . 434 18.23. Operation 26: READDIR - Read Directory . . . . . . . . . 438
18.24. Operation 27: READLINK - Read Symbolic Link . . . . . . 437 18.24. Operation 27: READLINK - Read Symbolic Link . . . . . . 441
18.25. Operation 28: REMOVE - Remove File System Object . . . . 438 18.25. Operation 28: REMOVE - Remove File System Object . . . . 442
18.26. Operation 29: RENAME - Rename Directory Entry . . . . . 441 18.26. Operation 29: RENAME - Rename Directory Entry . . . . . 445
18.27. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 444 18.27. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 448
18.28. Operation 32: SAVEFH - Save Current Filehandle . . . . . 445 18.28. Operation 32: SAVEFH - Save Current Filehandle . . . . . 449
18.29. Operation 33: SECINFO - Obtain Available Security . . . 446 18.29. Operation 33: SECINFO - Obtain Available Security . . . 450
18.30. Operation 34: SETATTR - Set Attributes . . . . . . . . . 449 18.30. Operation 34: SETATTR - Set Attributes . . . . . . . . . 453
18.31. Operation 37: VERIFY - Verify Same Attributes . . . . . 452 18.31. Operation 37: VERIFY - Verify Same Attributes . . . . . 456
18.32. Operation 38: WRITE - Write to File . . . . . . . . . . 453 18.32. Operation 38: WRITE - Write to File . . . . . . . . . . 457
18.33. Operation 40: BACKCHANNEL_CTL - Backchannel control . . 458 18.33. Operation 40: BACKCHANNEL_CTL - Backchannel control . . 462
18.34. Operation 41: BIND_CONN_TO_SESSION . . . . . . . . . . . 459 18.34. Operation 41: BIND_CONN_TO_SESSION . . . . . . . . . . . 463
18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 462 18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 466
18.36. Operation 43: CREATE_SESSION - Create New Session and 18.36. Operation 43: CREATE_SESSION - Create New Session and
Confirm Client ID . . . . . . . . . . . . . . . . . . . 478 Confirm Client ID . . . . . . . . . . . . . . . . . . . 482
18.37. Operation 44: DESTROY_SESSION - Destroy existing 18.37. Operation 44: DESTROY_SESSION - Destroy existing
session . . . . . . . . . . . . . . . . . . . . . . . . 487 session . . . . . . . . . . . . . . . . . . . . . . . . 492
18.38. Operation 45: FREE_STATEID - Free stateid with no 18.38. Operation 45: FREE_STATEID - Free stateid with no
locks . . . . . . . . . . . . . . . . . . . . . . . . . 489 locks . . . . . . . . . . . . . . . . . . . . . . . . . 494
18.39. Operation 46: GET_DIR_DELEGATION - Get a directory 18.39. Operation 46: GET_DIR_DELEGATION - Get a directory
delegation . . . . . . . . . . . . . . . . . . . . . . . 490 delegation . . . . . . . . . . . . . . . . . . . . . . . 495
18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 494 18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 499
18.41. Operation 48: GETDEVICELIST - Get All Device Mappings 18.41. Operation 48: GETDEVICELIST - Get All Device Mappings
for a File System . . . . . . . . . . . . . . . . . . . 496 for a File System . . . . . . . . . . . . . . . . . . . 501
18.42. Operation 49: LAYOUTCOMMIT - Commit writes made using 18.42. Operation 49: LAYOUTCOMMIT - Commit writes made using
a layout . . . . . . . . . . . . . . . . . . . . . . . . 498 a layout . . . . . . . . . . . . . . . . . . . . . . . . 503
18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 501 18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 506
18.44. Operation 51: LAYOUTRETURN - Release Layout 18.44. Operation 51: LAYOUTRETURN - Release Layout
Information . . . . . . . . . . . . . . . . . . . . . . 505 Information . . . . . . . . . . . . . . . . . . . . . . 510
18.45. Operation 52: SECINFO_NO_NAME - Get Security on 18.45. Operation 52: SECINFO_NO_NAME - Get Security on
Unnamed Object . . . . . . . . . . . . . . . . . . . . . 510 Unnamed Object . . . . . . . . . . . . . . . . . . . . . 515
18.46. Operation 53: SEQUENCE - Supply per-procedure 18.46. Operation 53: SEQUENCE - Supply per-procedure
sequencing and control . . . . . . . . . . . . . . . . . 511 sequencing and control . . . . . . . . . . . . . . . . . 516
18.47. Operation 54: SET_SSV - Update SSV for a Client ID . . . 517 18.47. Operation 54: SET_SSV - Update SSV for a Client ID . . . 522
18.48. Operation 55: TEST_STATEID - Test stateids for 18.48. Operation 55: TEST_STATEID - Test stateids for
validity . . . . . . . . . . . . . . . . . . . . . . . . 519 validity . . . . . . . . . . . . . . . . . . . . . . . . 524
18.49. Operation 56: WANT_DELEGATION - Request Delegation . . . 521 18.49. Operation 56: WANT_DELEGATION - Request Delegation . . . 526
18.50. Operation 57: DESTROY_CLIENTID - Destroy existing 18.50. Operation 57: DESTROY_CLIENTID - Destroy existing
client ID . . . . . . . . . . . . . . . . . . . . . . . 525 client ID . . . . . . . . . . . . . . . . . . . . . . . 529
18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims 18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims
Finished . . . . . . . . . . . . . . . . . . . . . . . . 525 Finished . . . . . . . . . . . . . . . . . . . . . . . . 530
18.52. Operation 10044: ILLEGAL - Illegal operation . . . . . . 528 18.52. Operation 10044: ILLEGAL - Illegal operation . . . . . . 532
19. NFSv44.1 Callback Procedures . . . . . . . . . . . . . . . . 528 19. NFSv4.1 Callback Procedures . . . . . . . . . . . . . . . . . 533
19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 529 19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 533
19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 529 19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 533
20. NFSv4.1 Callback Operations . . . . . . . . . . . . . . . . . 533 20. NFSv4.1 Callback Operations . . . . . . . . . . . . . . . . . 538
20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 533 20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 538
20.2. Operation 4: CB_RECALL - Recall an Open Delegation . . . 534 20.2. Operation 4: CB_RECALL - Recall an Open Delegation . . . 539
20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from 20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from
Client . . . . . . . . . . . . . . . . . . . . . . . . . 535 Client . . . . . . . . . . . . . . . . . . . . . . . . . 540
20.4. Operation 6: CB_NOTIFY - Notify directory changes . . . 539 20.4. Operation 6: CB_NOTIFY - Notify directory changes . . . 544
20.5. Operation 7: CB_PUSH_DELEG - Offer Delegation to 20.5. Operation 7: CB_PUSH_DELEG - Offer Delegation to
Client . . . . . . . . . . . . . . . . . . . . . . . . . 543 Client . . . . . . . . . . . . . . . . . . . . . . . . . 548
20.6. Operation 8: CB_RECALL_ANY - Keep any N delegations . . 544 20.6. Operation 8: CB_RECALL_ANY - Keep any N delegations . . 549
20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal 20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal
Resources for Recallable Objects . . . . . . . . . . . . 546 Resources for Recallable Objects . . . . . . . . . . . . 551
20.8. Operation 10: CB_RECALL_SLOT - change flow control 20.8. Operation 10: CB_RECALL_SLOT - change flow control
limits . . . . . . . . . . . . . . . . . . . . . . . . . 547 limits . . . . . . . . . . . . . . . . . . . . . . . . . 552
20.9. Operation 11: CB_SEQUENCE - Supply backchannel 20.9. Operation 11: CB_SEQUENCE - Supply backchannel
sequencing and control . . . . . . . . . . . . . . . . . 548 sequencing and control . . . . . . . . . . . . . . . . . 553
20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending 20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending
Delegation Wants . . . . . . . . . . . . . . . . . . . . 550 Delegation Wants . . . . . . . . . . . . . . . . . . . . 555
20.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible 20.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible
lock availability . . . . . . . . . . . . . . . . . . . 551 lock availability . . . . . . . . . . . . . . . . . . . 556
20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify device ID 20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify device ID
changes . . . . . . . . . . . . . . . . . . . . . . . . 553 changes . . . . . . . . . . . . . . . . . . . . . . . . 558
20.13. Operation 10044: CB_ILLEGAL - Illegal Callback 20.13. Operation 10044: CB_ILLEGAL - Illegal Callback
Operation . . . . . . . . . . . . . . . . . . . . . . . 555 Operation . . . . . . . . . . . . . . . . . . . . . . . 560
21. Security Considerations . . . . . . . . . . . . . . . . . . . 555 21. Security Considerations . . . . . . . . . . . . . . . . . . . 560
22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 557 22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 562
22.1. Named Attribute Definitions . . . . . . . . . . . . . . 557 22.1. Named Attribute Definitions . . . . . . . . . . . . . . 562
22.2. ONC RPC Network Identifiers (netids) . . . . . . . . . . 557 22.2. ONC RPC Network Identifiers (netids) . . . . . . . . . . 562
22.3. Defining New Notifications . . . . . . . . . . . . . . . 558 22.3. Defining New Notifications . . . . . . . . . . . . . . . 563
22.4. Defining New Layout Types . . . . . . . . . . . . . . . 559 22.4. Defining New Layout Types . . . . . . . . . . . . . . . 563
22.5. Path Variable Definitions . . . . . . . . . . . . . . . 560 22.5. Path Variable Definitions . . . . . . . . . . . . . . . 565
22.5.1. Path Variable Values . . . . . . . . . . . . . . . . 560 22.5.1. Path Variable Values . . . . . . . . . . . . . . . . 565
22.5.2. Path Variable Names . . . . . . . . . . . . . . . . 561 22.5.2. Path Variable Names . . . . . . . . . . . . . . . . 565
23. References . . . . . . . . . . . . . . . . . . . . . . . . . 561 23. References . . . . . . . . . . . . . . . . . . . . . . . . . 565
23.1. Normative References . . . . . . . . . . . . . . . . . . 561 23.1. Normative References . . . . . . . . . . . . . . . . . . 565
23.2. Informative References . . . . . . . . . . . . . . . . . 562 23.2. Informative References . . . . . . . . . . . . . . . . . 567
Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 564 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 568
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 566 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 570
Intellectual Property and Copyright Statements . . . . . . . . . 567 Intellectual Property and Copyright Statements . . . . . . . . . 572
1. Introduction 1. Introduction
1.1. The NFS Version 4 Minor Version 1 Protocol 1.1. The NFS Version 4 Minor Version 1 Protocol
The NFS version 4 minor version 1 (NFSv4.1) protocol is the second The NFS version 4 minor version 1 (NFSv4.1) protocol is the second
minor version of the NFS version 4 (NFSv4) protocol. The first minor minor version of the NFS version 4 (NFSv4) protocol. The first minor
version, NFSv4.0 is described in [21]. It generally follows the version, NFSv4.0 is described in [21]. It generally follows the
guidelines for minor versioning model listed in Section 10 of RFC guidelines for minor versioning model listed in Section 10 of RFC
3530. However, it diverges from guidelines 11 ("a client and server 3530. However, it diverges from guidelines 11 ("a client and server
skipping to change at page 14, line 47 skipping to change at page 14, line 47
intermediate storage or uninterruptible power system (UPS). intermediate storage or uninterruptible power system (UPS).
3. Server commit of data with battery-backed intermediate storage 3. Server commit of data with battery-backed intermediate storage
and recovery software. and recovery software.
4. Cache commit with uninterruptible power system (UPS) and 4. Cache commit with uninterruptible power system (UPS) and
recovery software. recovery software.
Stateid A 128-bit quantity returned by a server that uniquely Stateid A 128-bit quantity returned by a server that uniquely
defines the open and locking state provided by the server for a defines the open and locking state provided by the server for a
specific open or lock owner for a specific file and type of lock. specific open-owner or lock-owner/open-owner pair for a specific
file and type of lock.
Verifier A 64-bit quantity generated by the client that the server Verifier A 64-bit quantity generated by the client that the server
can use to determine if the client has restarted and lost all can use to determine if the client has restarted and lost all
previous lock state. previous lock state.
1.6. Overview of NFSv4.1 Features 1.6. Overview of NFSv4.1 Features
To provide a reasonable context for the reader, the major features of To provide a reasonable context for the reader, the major features of
the NFSv4.1 protocol will be reviewed in brief. This will be done to the NFSv4.1 protocol will be reviewed in brief. This will be done to
provide an appropriate context for both the reader who is familiar provide an appropriate context for both the reader who is familiar
skipping to change at page 16, line 39 skipping to change at page 16, line 47
systems are reachable from a special per-server global root systems are reachable from a special per-server global root
filehandle. This allows LOOKUP operations to be used to perform filehandle. This allows LOOKUP operations to be used to perform
functions previously provided by the MOUNT protocol. The server functions previously provided by the MOUNT protocol. The server
provides any necessary pseudo file systems to bridge any gaps that provides any necessary pseudo file systems to bridge any gaps that
arise due to unexported gaps between exported file systems. arise due to unexported gaps between exported file systems.
1.6.3.1. Filehandles 1.6.3.1. Filehandles
As in previous versions of the NFS protocol, opaque filehandles are As in previous versions of the NFS protocol, opaque filehandles are
used to identify individual files and directories. Lookup-type and used to identify individual files and directories. Lookup-type and
create operations are used to go from file and directory names to the create operations translate file and directory names to filehandles
filehandle which is then used to identify the object to subsequent which are then used to identify objects in subsequent operations.
operations.
The NFSv4.1 protocol provides support for persistent filehandles, The NFSv4.1 protocol provides support for persistent filehandles,
guaranteed to be valid for the lifetime of the file system object guaranteed to be valid for the lifetime of the file system object
designated. In addition it provides support to servers to provide designated. In addition it provides support to servers to provide
filehandles with more limited validity guarantees, called volatile filehandles with more limited validity guarantees, called volatile
filehandles. filehandles.
1.6.3.2. File Attributes 1.6.3.2. File Attributes
The NFSv4.1 protocol has a rich and extensible attribute structure. The NFSv4.1 protocol has a rich and extensible attribute structure,
Only a small set of the defined attributes are REQUIRED to be which is divided into REQUIRED, RECOMMENDED, and named attributes.
provided by all server implementations. The other attributes are
known as RECOMMENDED attributes.
The acl, sacl, and dacl attributes are a significant set of file The acl, sacl, and dacl attributes compose a set of RECOMMENDED file
attributes that make up the Access Control List (ACL) of a file. attributes that make up the Access Control List (ACL) of a file
These attributes provide for directory and file access control beyond (Section 6). These attributes provide for directory and file access
the model used in NFSv3. The ACL definition allows for specification control beyond the model used in NFSv3. The ACL definition allows
of specific sets of permissions for individual users and groups. In for specification of specific sets of permissions for individual
addition, ACL inheritance allows propagation of access permissions users and groups. In addition, ACL inheritance allows propagation of
and restriction down a directory tree as file system objects are access permissions and restriction down a directory tree as file
created. system objects are created.
One other type of attribute is the named attribute. A named A named attribute is an opaque byte stream that is associated with a
attribute is an opaque byte stream that is associated with a
directory or file and referred to by a string name. Named attributes directory or file and referred to by a string name. Named attributes
are meant to be used by client applications as a method to associate are meant to be used by client applications as a method to associate
application-specific data with a regular file or directory. NFSv4.1 application-specific data with a regular file or directory. NFSv4.1
modifies named attributes relative to NFSv4.0 by tightening the modifies named attributes relative to NFSv4.0 by tightening the
allowed operations in order to prevent the development of non- allowed operations in order to prevent the development of non-
interoperable implementation. See Section 5.3 for details. interoperable implementation. See Section 5.3 for details.
1.6.3.3. Multi-server Namespace 1.6.3.3. Multi-server Namespace
NFSv4.1 contains a number of features to allow implementation of NFSv4.1 contains a number of features to allow implementation of
skipping to change at page 18, line 26 skipping to change at page 18, line 30
The types of locks are: The types of locks are:
o Share reservations as established by OPEN operations. o Share reservations as established by OPEN operations.
o Byte-range locks. o Byte-range locks.
o File delegations, which are recallable locks that assure the o File delegations, which are recallable locks that assure the
holder that inconsistent opens and file changes cannot occur so holder that inconsistent opens and file changes cannot occur so
long as the delegation is held. long as the delegation is held.
o Directory delegations, which are recallable delegations that o Directory delegations, which are recallable locks that assure the
assure the holder that inconsistent directory modifications cannot holder that inconsistent directory modifications cannot occur so
occur so long as the delegation is held. long as the delegation is held.
o Layouts, which are recallable objects that assure the holder that o Layouts, which are recallable objects that assure the holder that
direct access to the file data may be performed directly by the direct access to the file data may be performed directly by the
client and that no change to the data's location inconsistent with client and that no change to the data's location inconsistent with
that access may be made so long as the layout is held. that access may be made so long as the layout is held.
All locks for a given client are tied together under a single client- All locks for a given client are tied together under a single client-
wide lease. All requests made on sessions associated with the client wide lease. All requests made on sessions associated with the client
renew that lease. When leases are not promptly renewed locks are renew that lease. When leases are not promptly renewed locks are
subject to revocation. In the event of server reboot, clients have subject to revocation. In the event of server reboot, clients have
the opportunity to safely reclaim their locks within a special grace the opportunity to safely reclaim their locks within a special grace
period. period.
1.7. Differences from NFSv4.0 1.7. Differences from NFSv4.0
The following summarizes the differences between minor version one The following summarizes the major differences between minor version
and the base protocol: one and the base protocol:
o Implementation of the sessions model. o Implementation of the sessions model.
o Support for parallel access to data. o Support for parallel access to data.
o Addition of the RECLAIM_COMPLETE operation to better structure the o Addition of the RECLAIM_COMPLETE operation to better structure the
lock reclamation process. lock reclamation process.
o Support for delegations on directories and other file types in o Support for delegations on directories and other file types in
addition to regular files. addition to regular files.
skipping to change at page 19, line 30 skipping to change at page 19, line 35
2.2. RPC and XDR 2.2. RPC and XDR
The NFSv4.1 protocol is a Remote Procedure Call (RPC) application The NFSv4.1 protocol is a Remote Procedure Call (RPC) application
that uses RPC version 2 and the corresponding eXternal Data that uses RPC version 2 and the corresponding eXternal Data
Representation (XDR) as defined in [3] and [2]. Representation (XDR) as defined in [3] and [2].
2.2.1. RPC-based Security 2.2.1. RPC-based Security
Previous NFS versions have been thought of as having a host-based Previous NFS versions have been thought of as having a host-based
authentication model, where the NFS server authenticates the NFS authentication model, where the NFS server authenticates the NFS
client, and trust the client to authenticate all users. Actually, client, and trusts the client to authenticate all users. Actually,
NFS has always depended on RPC for authentication. The first form of NFS has always depended on RPC for authentication. One of the first
RPC authentication which required a host-based authentication forms of RPC authentication, AUTH_SYS, had no strong authentication,
approach. NFSv4.1 also depends on RPC for basic security services, and required a host-based authentication approach. NFSv4.1 also
and mandates RPC support for a user-based authentication model. The depends on RPC for basic security services, and mandates RPC support
user-based authentication model has user principals authenticated by for a user-based authentication model. The user-based authentication
a server, and in turn the server authenticated by user principals. model has user principals authenticated by a server, and in turn the
RPC provides some basic security services which are used by NFSv4.1. server authenticated by user principals. RPC provides some basic
security services which are used by NFSv4.1.
2.2.1.1. RPC Security Flavors 2.2.1.1. RPC Security Flavors
As described in section 7.2 "Authentication" of [3], RPC security is As described in section 7.2 "Authentication" of [3], RPC security is
encapsulated in the RPC header, via a security or authentication encapsulated in the RPC header, via a security or authentication
flavor, and information specific to the specification of the security flavor, and information specific to the specified security flavor.
flavor. Every RPC header conveys information used to identify and Every RPC header conveys information used to identify and
authenticate a client and server. As discussed in Section 2.2.1.1.1, authenticate a client and server. As discussed in Section 2.2.1.1.1,
some security flavors provide additional security services. some security flavors provide additional security services.
NFSv4.1 clients and servers MUST implement RPCSEC_GSS. (This NFSv4.1 clients and servers MUST implement RPCSEC_GSS. (This
requirement to implement is not a requirement to use.) Other requirement to implement is not a requirement to use.) Other
flavors, such as AUTH_NONE, and AUTH_SYS, MAY be implemented as well. flavors, such as AUTH_NONE, and AUTH_SYS, MAY be implemented as well.
2.2.1.1.1. RPCSEC_GSS and Security Services 2.2.1.1.1. RPCSEC_GSS and Security Services
RPCSEC_GSS ([4]) uses the functionality of GSS-API [7]. This allows RPCSEC_GSS ([4]) uses the functionality of GSS-API [7]. This allows
skipping to change at page 24, line 14 skipping to change at page 24, line 21
recovery see Section 12.7.1. recovery see Section 12.7.1.
Releasing such state requires that the server be able to determine Releasing such state requires that the server be able to determine
that one client instance is the successor of another. Where this that one client instance is the successor of another. Where this
cannot be done, for any of a number of reasons, the locking state cannot be done, for any of a number of reasons, the locking state
will remain for a time subject to lease expiration (see Section 8.3) will remain for a time subject to lease expiration (see Section 8.3)
and the new client will need to wait for such state to be removed, if and the new client will need to wait for such state to be removed, if
it makes conflicting lock requests. it makes conflicting lock requests.
Client identification is encapsulated in the following Client Owner Client identification is encapsulated in the following Client Owner
structure: data type:
struct client_owner4 { struct client_owner4 {
verifier4 co_verifier; verifier4 co_verifier;
opaque co_ownerid<NFS4_OPAQUE_LIMIT>; opaque co_ownerid<NFS4_OPAQUE_LIMIT>;
}; };
The first field, co_verifier, is a client incarnation verifier. The The first field, co_verifier, is a client incarnation verifier. The
server will start the process of canceling the client's leased state server will start the process of canceling the client's leased state
if co_verifier is different than what the server has previously if co_verifier is different than what the server has previously
recorded for the identified client (as specified in the co_ownerid recorded for the identified client (as specified in the co_ownerid
skipping to change at page 25, line 39 skipping to change at page 25, line 46
* A MAC address (again, a one way function should be performed). * A MAC address (again, a one way function should be performed).
* The timestamp of when the NFSv4.1 software was first installed * The timestamp of when the NFSv4.1 software was first installed
on the client (though this is subject to the previously on the client (though this is subject to the previously
mentioned caution about using information that is stored in a mentioned caution about using information that is stored in a
file, because the file might only be accessible over NFSv4.1). file, because the file might only be accessible over NFSv4.1).
* A true random number. However since this number ought to be * A true random number. However since this number ought to be
the same between client incarnations, this shares the same the same between client incarnations, this shares the same
problem as that of the using the timestamp of the software problem as that of using the timestamp of the software
installation. installation.
o For a user level NFSv4.1 client, it should contain additional o For a user level NFSv4.1 client, it should contain additional
information to distinguish the client from other user level information to distinguish the client from other user level
clients running on the same host, such as a process identifier or clients running on the same host, such as a process identifier or
other unique sequence. other unique sequence.
The client ID is assigned by the server (the eir_clientid result from The client ID is assigned by the server (the eir_clientid result from
EXCHANGE_ID) and should be chosen so that it will not conflict with a EXCHANGE_ID) and should be chosen so that it will not conflict with a
client ID previously assigned by the server. This applies across client ID previously assigned by the server. This applies across
skipping to change at page 26, line 43 skipping to change at page 26, line 49
an attempt is made to establish this new session with the existing an attempt is made to establish this new session with the existing
client ID, the server will reject the request with client ID, the server will reject the request with
NFS4ERR_STALE_CLIENTID. NFS4ERR_STALE_CLIENTID.
When NFS4ERR_STALE_CLIENTID is received in either of these When NFS4ERR_STALE_CLIENTID is received in either of these
situations, the client must obtain a new client ID by use of the situations, the client must obtain a new client ID by use of the
EXCHANGE_ID operation, then use that client ID as the basis of a new EXCHANGE_ID operation, then use that client ID as the basis of a new
session, and then proceed to any other necessary recovery for the session, and then proceed to any other necessary recovery for the
server restart case (See Section 8.4.2). server restart case (See Section 8.4.2).
See the detailed descriptions of EXCHANGE_ID (Section 18.35 and See the descriptions of EXCHANGE_ID (Section 18.35) and
CREATE_SESSION (Section 18.36) for a complete specification of these CREATE_SESSION (Section 18.36) for a complete specification of these
operations. operations.
2.4.1. Upgrade from NFSv4.0 to NFSv4.1 2.4.1. Upgrade from NFSv4.0 to NFSv4.1
To facilitate upgrade from NFSv4.0 to NFSv4.1, a server may compare a To facilitate upgrade from NFSv4.0 to NFSv4.1, a server may compare a
client_owner4 in an EXCHANGE_ID with an nfs_client_id4 established client_owner4 in an EXCHANGE_ID with an nfs_client_id4 established
using SETCLIENTID using NFSv4.0, so that an NFSv4.1 client is not using the SETCLIENTID operation of NFSv4.0. A server that does so
forced to delay until lease expiration for locking state established will allow an upgraded client to avoid waiting until the lease (i.e.
by the earlier client using minor version 0. This requires the the lease established by the NFSv4.0 instance client) expires. This
client_owner4 be constructed the same way as the nfs_client_id4. If requires the client_owner4 be constructed the same way as the
the latter's contents included the server's network address, and the nfs_client_id4. If the latter's contents included the server's
NFSv4.1 client does not wish to use a client ID that prevents network address (per the recommendations of the NFSv4.0 specification
trunking, it should send two EXCHANGE_ID operations. The first [21]), and the NFSv4.1 client does not wish to use a client ID that
EXCHANGE_ID will have a client_owner4 equal to the nfs_client_id4. prevents trunking, it should send two EXCHANGE_ID operations. The
This will clear the state created by the NFSv4.0 client. The second first EXCHANGE_ID will have a client_owner4 equal to the
EXCHANGE_ID will not have the server's network address. The state nfs_client_id4. This will clear the state created by the NFSv4.0
created for the second EXCHANGE_ID will not have to wait for lease client. The second EXCHANGE_ID will not have the server's network
expiration, because there will be no state to expire. address. The state created for the second EXCHANGE_ID will not have
to wait for lease expiration, because there will be no state to
expire.
2.4.2. Server Release of Client ID 2.4.2. Server Release of Client ID
NFSv4.1 introduces a new operation called DESTROY_CLIENTID NFSv4.1 introduces a new operation called DESTROY_CLIENTID
(Section 18.50) which the client SHOULD use to destroy a client ID it (Section 18.50) which the client SHOULD use to destroy a client ID it
no longer needs. This permits graceful, bilateral release of a no longer needs. This permits graceful, bilateral release of a
client ID. The operation cannot be used if there are sessions client ID. The operation cannot be used if there are sessions
associated with the client ID, or state with an unexpired lease. associated with the client ID, or state with an unexpired lease.
If the server determines that the client holds no associated state If the server determines that the client holds no associated state
for its client ID (including sessions, opens, locks, delegations, for its client ID (including sessions, opens, locks, delegations,
layouts, and wants), the server may choose to unilaterally release layouts, and wants), the server may choose to unilaterally release
the client ID. The server may make this choice for an inactive the client ID in order to conserve resources. If the client contacts
client so that resources are not consumed by those intermittently the server after this release, the server must ensure the client
active clients. If the client contacts the server after this receives the appropriate error so that it will use the EXCHANGE_ID/
release, the server must ensure the client receives the appropriate CREATE_SESSION sequence to establish a new client ID. The server
error so that it will use the EXCHANGE_ID/CREATE_SESSION sequence to ought to be very hesitant to release a client ID since the resulting
establish a new identity. It should be clear that the server must be work on the client to recover from such an event will be the same
very hesitant to release a client ID since the resulting work on the burden as if the server had failed and restarted. Typically a server
client to recover from such an event will be the same burden as if would not release a client ID unless there had been no activity from
the server had failed and restarted. Typically a server would not that client for many minutes. As long as there are sessions, opens,
release a client ID unless there had been no activity from that
client for many minutes. As long as there are sessions, opens,
locks, delegations, layouts, or wants, the server MUST NOT release locks, delegations, layouts, or wants, the server MUST NOT release
the client ID. See Section 2.10.10.1.4 for discussion on releasing the client ID. See Section 2.10.10.1.4 for discussion on releasing
inactive sessions. inactive sessions.
2.4.3. Resolving Client Owner Conflicts 2.4.3. Resolving Client Owner Conflicts
When the server gets an EXCHANGE_ID for a client owner that currently When the server gets an EXCHANGE_ID for a client owner that currently
has no state, or that has state, but the lease has expired, the has no state, or that has state, but the lease has expired, the
server MUST allow the EXCHANGE_ID, and confirm the new client ID if server MUST allow the EXCHANGE_ID, and confirm the new client ID if
followed by the appropriate CREATE_SESSION. followed by the appropriate CREATE_SESSION.
When the server gets an EXCHANGE_ID for a new incarnation of a client When the server gets an EXCHANGE_ID for a new incarnation of a client
owner that currently has an old incarnation with state and an owner that currently has an old incarnation with state and an
unexpired lease, the server is allowed to dispose of the state of the unexpired lease, the server is allowed to dispose of the state of the
previous incarnation of the client owner if one of the following are previous incarnation of the client owner if one of the following are
true: true:
o The principal that created the client ID for the client owner is o The principal that created the client ID for the client owner is
the same as the principal that is issuing the EXCHANGE_ID. Note the same as the principal that is issuing the EXCHANGE_ID. Note
that if the client ID was created with SP4_MACH_CRED protection that if the client ID was created with SP4_MACH_CRED state
(Section 18.35), the principal MUST be based on RPCSEC_GSS protection (Section 18.35), the principal MUST be based on
authentication, the RPCSEC_GSS service used MUST be integrity or RPCSEC_GSS authentication, the RPCSEC_GSS service used MUST be
privacy, and the same GSS mechanism and principal must be used as integrity or privacy, and the same GSS mechanism and principal
that used when the client ID was created. must be used as that used when the client ID was created.
o The client ID was established with SP4_SSV protection o The client ID was established with SP4_SSV protection
(Section 18.35, Section 2.10.7.3) and the client sends the (Section 18.35, Section 2.10.7.3) and the client sends the
EXCHANGE_ID with the security flavor set to RPCSEC_GSS using the EXCHANGE_ID with the security flavor set to RPCSEC_GSS using the
GSS SSV mechanism (Section 2.10.8). GSS SSV mechanism (Section 2.10.8).
o The client ID was established with SP4_SSV protection. Because o The client ID was established with SP4_SSV protection, and under
the SSV might not persist across client and server restart, and the conditions described herein, the EXCHANGE_ID was sent with
because the first time a client sends EXCHANGE_ID to a server it SP4_MACH_CRED state protection. Because the SSV might not persist
does not have an SSV, the client MAY send the subsequent across client and server restart, and because the first time a
EXCHANGE_ID without an SSV RPCSEC_GSS handle. Instead, as with client sends EXCHANGE_ID to a server it does not have an SSV, the
SP4_MACH_CRED protection, the principal MUST be based on client MAY send the subsequent EXCHANGE_ID without an SSV
RPCSEC_GSS authentication, the RPCSEC_GSS service used MUST be RPCSEC_GSS handle. Instead, as with SP4_MACH_CRED protection, the
integrity or privacy, and the same GSS mechanism and principal principal MUST be based on RPCSEC_GSS authentication, the
must be used as that used when the client ID was created. RPCSEC_GSS service used MUST be integrity or privacy, and the same
GSS mechanism and principal MUST be used as that used when the
client ID was created.
If none of the above situations apply, the server MUST return If none of the above situations apply, the server MUST return
NFS4ERR_CLID_INUSE. NFS4ERR_CLID_INUSE.
If the server accepts the principal and co_ownerid as matching that If the server accepts the principal and co_ownerid as matching that
which created the client ID, it deletes state (once CREATE_SESSION which created the client ID, and the co_verifier in the EXCHANGE_ID
confirms the client ID) if the co_verifier in the EXCHANGE_ID differs differs from the co_verifier used when the client ID was created,
from the co_verifier used when the client ID was created. If the then after the server receives a CREATE_SESSION that confirms the
co_verifier values are the same, then the client is either updating client ID, the server deletes state. If the co_verifier values are
properties of the client ID (Section 18.35), or possibly attempting the same, (e.g. the client is either updating properties of the
trunking (Section 2.10.4) and the server MUST NOT delete state. client ID (Section 18.35), or the client is attempting trunking
(Section 2.10.4) the server MUST NOT delete state.
2.5. Server Owners 2.5. Server Owners
The Server Owner is similar to a Client Owner (Section 2.4), but The Server Owner is similar to a Client Owner (Section 2.4), but
unlike the Client Owner, there is no shorthand serverid. The Server unlike the Client Owner, there is no shorthand server ID. The Server
Owner is defined in the following structure: Owner is defined in the following data type:
struct server_owner4 { struct server_owner4 {
uint64_t so_minor_id; uint64_t so_minor_id;
opaque so_major_id<NFS4_OPAQUE_LIMIT>; opaque so_major_id<NFS4_OPAQUE_LIMIT>;
}; };
The Server Owner is returned from EXCHANGE_ID. When the so_major_id The Server Owner is returned from EXCHANGE_ID. When the so_major_id
fields are the same in two EXCHANGE_ID results, the connections each fields are the same in two EXCHANGE_ID results, the connections each
EXCHANGE_ID are sent over can be assumed to address the same Server EXCHANGE_ID were sent over can be assumed to address the same Server
(as defined in Section 1.5). If the so_minor_id fields are also the (as defined in Section 1.5). If the so_minor_id fields are also the
same, then not only do both connections connect to the same server, same, then not only do both connections connect to the same server,
but the session and other state can be shared across both but the session can be shared across both connections. The reader is
connections. The reader is cautioned that multiple servers may cautioned that multiple servers may deliberately or accidentally
deliberately or accidentally claim to have the same so_major_id or claim to have the same so_major_id or so_major_id/so_minor_id; the
so_major_id/so_minor_id; the reader should examine Section 2.10.4 and reader should examine Section 2.10.4 and Section 18.35 in order to
Section 18.35. avoid acting on falsely matching Server Owner values.
The considerations for generating a so_major_id are similar to that The considerations for generating a so_major_id are similar to that
for generating a co_ownerid string (see Section 2.4). The for generating a co_ownerid string (see Section 2.4). The
consequences of two servers generating conflicting so_major_id values consequences of two servers generating conflicting so_major_id values
are less dire than they are for co_ownerid conflicts because the are less dire than they are for co_ownerid conflicts because the
client can use RPCSEC_GSS to compare the authenticity of each server client can use RPCSEC_GSS to compare the authenticity of each server
(see Section 2.10.4). (see Section 2.10.4).
2.6. Security Service Negotiation 2.6. Security Service Negotiation
skipping to change at page 30, line 49 skipping to change at page 31, line 15
2.6.3.1.1. Put Filehandle Operation + SAVEFH 2.6.3.1.1. Put Filehandle Operation + SAVEFH
The client is saving a filehandle for a future RESTOREFH. The server The client is saving a filehandle for a future RESTOREFH. The server
MUST NOT return NFS4ERR_WRONGSEC to either the put filehandle MUST NOT return NFS4ERR_WRONGSEC to either the put filehandle
operation or SAVEFH. operation or SAVEFH.
2.6.3.1.2. Two or More Put Filehandle Operations 2.6.3.1.2. Two or More Put Filehandle Operations
For a series of N put filehandle operations, the server MUST NOT For a series of N put filehandle operations, the server MUST NOT
return NFS4ERR_WRONGSEC to the first N-1 put filehandle operations. return NFS4ERR_WRONGSEC to the first N-1 put filehandle operations.
The Nth put filehandle operation is handled as if it is the first in The N'th put filehandle operation is handled as if it is the first in
a series of operations, and the second in the series of operations is a subseries of operations. For example if the server received PUTFH,
not a put filehandle operation. For example if the server received PUTROOTFH, LOOKUP, then the PUTFH is ignored for NFS4ERR_WRONGSEC
PUTFH, PUTROOTFH, LOOKUP, then the PUTFH is ignored for purposes, and the PUTROOTFH, LOOKUP subseries is processed as
NFS4ERR_WRONGSEC purposes, and the PUTROOTFH, LOOKUP subseries is according to Section 2.6.3.1.3.
processed as according to Section 2.6.3.1.3.
2.6.3.1.3. Put Filehandle Operation + LOOKUP (or OPEN by Name) 2.6.3.1.3. Put Filehandle Operation + LOOKUP (or OPEN by Name)
This situation also applies to a put filehandle operation followed by This situation also applies to a put filehandle operation followed by
a LOOKUP or an OPEN operation that specifies a component name. a LOOKUP or an OPEN operation that specifies a component name.
In this situation, the client is potentially crossing a security In this situation, the client is potentially crossing a security
policy boundary, and the set of security tuples the parent directory policy boundary, and the set of security tuples the parent directory
supports may differ from those of the child. The server supports may differ from those of the child. The server
implementation may decide whether to impose any restrictions on implementation may decide whether to impose any restrictions on
skipping to change at page 33, line 25 skipping to change at page 33, line 37
A COMPOUND containing the series put filehandle operation + A COMPOUND containing the series put filehandle operation +
SECINFO_NO_NAME (style SECINFO_STYLE4_CURRENT_FH) is an efficient way SECINFO_NO_NAME (style SECINFO_STYLE4_CURRENT_FH) is an efficient way
for the client to recover from NFS4ERR_WRONGSEC. for the client to recover from NFS4ERR_WRONGSEC.
The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC to any operation The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC to any operation
other than a put filehandle operation, LOOKUP, LOOKUPP, and OPEN (by other than a put filehandle operation, LOOKUP, LOOKUPP, and OPEN (by
component name). component name).
2.6.3.1.8. Operations after SECINFO and SECINFO_NO_NAME 2.6.3.1.8. Operations after SECINFO and SECINFO_NO_NAME
Placing an operation that uses the current filehandle after SECINFO Suppose a client sends a COMPOUND procedure containing the series
or SECINFO_NO_NAME seemingly introduces a issue with what error to SEQUENCE, PUTFH, SECINFO_NONAME, READ, and suppose the security tuple
return when security tuple of the request is not allowed for the used does not match that required for the target file. By rule (see
operation that uses the current filehandle. For example, suppose a
client sends a COMPOUND procedure containing the series SEQUENCE,
PUTFH, SECINFO_NONAME, READ, and suppose the security tuple used does
not match that required for the target file. By rule (see
Section 2.6.3.1.5), neither PUTFH nor SECINFO_NO_NAME can return Section 2.6.3.1.5), neither PUTFH nor SECINFO_NO_NAME can return
NFS4ERR_WRONGSEC. By rule (see Section 2.6.3.1.7), READ cannot NFS4ERR_WRONGSEC. By rule (see Section 2.6.3.1.7), READ cannot
return NFS4ERR_WRONGSEC. The issue is resolved by the fact that return NFS4ERR_WRONGSEC. The issue is resolved by the fact that
SECINFO and SECINFO_NO_NAME consume the current filehandle. This SECINFO and SECINFO_NO_NAME consume the current filehandle (note that
leaves no current filehandle for READ to use, and READ returns this is a change from NFSv4.0). This leaves no current filehandle
NFS4ERR_NOFILEHANDLE. for READ to use, and READ returns NFS4ERR_NOFILEHANDLE.
2.7. Minor Versioning 2.7. Minor Versioning
To address the requirement of an NFS protocol that can evolve as the To address the requirement of an NFS protocol that can evolve as the
need arises, the NFSv4.1 protocol contains the rules and framework to need arises, the NFSv4.1 protocol contains the rules and framework to
allow for future minor changes or versioning. allow for future minor changes or versioning.
The base assumption with respect to minor versioning is that any The base assumption with respect to minor versioning is that any
future accepted minor version must follow the IETF process and be future accepted minor version must follow the IETF process and be
documented in a standards track RFC. Therefore, each minor version documented in a standards track RFC. Therefore, each minor version
skipping to change at page 35, line 9 skipping to change at page 35, line 17
* adding bits to flag fields such as new attributes to * adding bits to flag fields such as new attributes to
GETATTR's bitmap4 data type and providing corresponding GETATTR's bitmap4 data type and providing corresponding
variants of opaque arrays, such as a notify4 used together variants of opaque arrays, such as a notify4 used together
with such bitmaps. with such bitmaps.
* adding bits to existing attributes like ACLs that have flag * adding bits to existing attributes like ACLs that have flag
words words
* extending enumerated types (including NFS4ERR_*) with new * extending enumerated types (including NFS4ERR_*) with new
values and values
* adding cases to a switched union
4. Minor versions may not modify the structure of existing 4. Minor versions may not modify the structure of existing
attributes. attributes.
5. Minor versions may not delete operations. 5. Minor versions may not delete operations.
This prevents the potential reuse of a particular operation This prevents the potential reuse of a particular operation
"slot" in a future minor version. "slot" in a future minor version.
6. Minor versions may not delete attributes. 6. Minor versions may not delete attributes.
skipping to change at page 36, line 18 skipping to change at page 36, line 30
13. A client MUST NOT attempt to use a stateid, filehandle, or 13. A client MUST NOT attempt to use a stateid, filehandle, or
similar returned object from the COMPOUND procedure with minor similar returned object from the COMPOUND procedure with minor
version X for another COMPOUND procedure with minor version Y, version X for another COMPOUND procedure with minor version Y,
where X != Y. where X != Y.
2.8. Non-RPC-based Security Services 2.8. Non-RPC-based Security Services
As described in Section 2.2.1.1.1.1, NFSv4.1 relies on RPC for As described in Section 2.2.1.1.1.1, NFSv4.1 relies on RPC for
identification, authentication, integrity, and privacy. NFSv4.1 identification, authentication, integrity, and privacy. NFSv4.1
itself provides additional security services as described in the next itself provides or enables additional security services as described
several subsections. in the next several subsections.
2.8.1. Authorization 2.8.1. Authorization
Authorization to access a file object via an NFSv4.1 operation is Authorization to access a file object via an NFSv4.1 operation is
ultimately determined by the NFSv4.1 server. A client can ultimately determined by the NFSv4.1 server. A client can
predetermine its access to a file object via the OPEN (Section 18.16) predetermine its access to a file object via the OPEN (Section 18.16)
and the ACCESS (Section 18.1) operations. and the ACCESS (Section 18.1) operations.
Principals with appropriate access rights can modify the Principals with appropriate access rights can modify the
authorization on a file object via the SETATTR (Section 18.30) authorization on a file object via the SETATTR (Section 18.30)
skipping to change at page 37, line 32 skipping to change at page 37, line 42
time this document was written, the only two transports that had the time this document was written, the only two transports that had the
above attributes were TCP and SCTP. To enhance the possibilities for above attributes were TCP and SCTP. To enhance the possibilities for
interoperability, an NFSv4.1 implementation MUST support operation interoperability, an NFSv4.1 implementation MUST support operation
over the TCP transport protocol. over the TCP transport protocol.
Even if NFSv4.1 is used over a non-IP network protocol, it is Even if NFSv4.1 is used over a non-IP network protocol, it is
RECOMMENDED that the transport support congestion control. RECOMMENDED that the transport support congestion control.
It is permissible for a connectionless transport to be used under It is permissible for a connectionless transport to be used under
NFSv4.1, however reliable and in-order delivery of data by the NFSv4.1, however reliable and in-order delivery of data by the
connectionless transport is still required. NFSv4.1 assumes that a connectionless transport is REQUIRED. NFSv4.1 assumes that a client
client transport address and server transport address used to send transport address and server transport address used to send data over
data over a transport together constitute a connection, even if the a transport together constitute a connection, even if the underlying
underlying transport eschews the concept of a connection. transport eschews the concept of a connection.
2.9.2. Client and Server Transport Behavior 2.9.2. Client and Server Transport Behavior
If a connection-oriented transport (e.g. TCP) is used the client and If a connection-oriented transport (e.g. TCP) is used, the client
server SHOULD use long lived connections for at least three reasons: and server SHOULD use long lived connections for at least three
reasons:
1. This will prevent the weakening of the transport's congestion 1. This will prevent the weakening of the transport's congestion
control mechanisms via short lived connections. control mechanisms via short lived connections.
2. This will improve performance for the WAN environment by 2. This will improve performance for the WAN environment by
eliminating the need for connection setup handshakes. eliminating the need for connection setup handshakes.
3. The NFSv4.1 callback model differs from NFSv4.0, and requires the 3. The NFSv4.1 callback model differs from NFSv4.0, and requires the
client and server to maintain a client-created backchannel (see client and server to maintain a client-created backchannel (see
Section 2.10.3.1) for the server to use. Section 2.10.3.1) for the server to use.
skipping to change at page 39, line 7 skipping to change at page 39, line 18
o RDMA credits present a new issue to the reply cache in NFSv4.1. o RDMA credits present a new issue to the reply cache in NFSv4.1.
The reply cache may be used when a connection within a session is The reply cache may be used when a connection within a session is
lost, such as after the client reconnects. Credit information is lost, such as after the client reconnects. Credit information is
a dynamic property of the RDMA connection, and stale values must a dynamic property of the RDMA connection, and stale values must
not be replayed from the cache. This implies that the reply cache not be replayed from the cache. This implies that the reply cache
contents must not be blindly used when replies are sent from it, contents must not be blindly used when replies are sent from it,
and credit information appropriate to the channel must be and credit information appropriate to the channel must be
refreshed by the RPC layer. refreshed by the RPC layer.
In addition, the NFSv4.1 requester is not allowed to stop waiting for In addition, as described in Section 2.10.5.2, while a session is
a reply, as described in Section 2.10.5.2. active, the NFSv4.1 requester MUST NOT stop waiting for a reply.
2.9.3. Ports 2.9.3. Ports
Historically, NFSv3 servers have listened over TCP port 2049. The Historically, NFSv3 servers have listened over TCP port 2049. The
registered port 2049 [25] for the NFS protocol should be the default registered port 2049 [25] for the NFS protocol should be the default
configuration. NFSv4.1 clients SHOULD NOT use the RPC binding configuration. NFSv4.1 clients SHOULD NOT use the RPC binding
protocols as described in [26]. protocols as described in [26].
2.10. Session 2.10. Session
2.10.1. Motivation and Overview 2.10.1. Motivation and Overview
Previous versions and minor versions of NFS have suffered from the Previous versions and minor versions of NFS have suffered from the
following: following:
o Lack of support for exactly once semantics (EOS). This includes o Lack of support for Exactly Once Semantics (EOS). This includes
lack of support for EOS through server failure and recovery. lack of support for EOS through server failure and recovery.
o Limited callback support, including no support for sending o Limited callback support, including no support for sending
callbacks through firewalls, and races between responses from callbacks through firewalls, and races between replies to normal
normal requests, and callbacks. requests and callbacks.
o Limited trunking over multiple network paths. o Limited trunking over multiple network paths.
o Requiring machine credentials for fully secure operation. o Requiring machine credentials for fully secure operation.
Through the introduction of a session, NFSv4.1 addresses the above Through the introduction of a session, NFSv4.1 addresses the above
shortfalls with practical solutions: shortfalls with practical solutions:
o EOS is enabled by a reply cache with a bounded size, making it o EOS is enabled by a reply cache with a bounded size, making it
feasible to keep the cache in persistent storage and enable EOS feasible to keep the cache in persistent storage and enable EOS
skipping to change at page 43, line 12 skipping to change at page 43, line 24
A connection's association with a session is not exclusive. A A connection's association with a session is not exclusive. A
connection associated with the channel(s) of one session may be connection associated with the channel(s) of one session may be
simultaneously associated with the channel(s) of other sessions simultaneously associated with the channel(s) of other sessions
including sessions associated with other client IDs. including sessions associated with other client IDs.
It is permissible for connections of multiple transport types to be It is permissible for connections of multiple transport types to be
associated with the same channel. For example both a TCP and RDMA associated with the same channel. For example both a TCP and RDMA
connection can be associated with the fore channel. In the event an connection can be associated with the fore channel. In the event an
RDMA and non-RDMA connection are associated with the same channel, RDMA and non-RDMA connection are associated with the same channel,
the maximum number of slots SHOULD be at least one more than the the maximum number of slots SHOULD be at least one more than the
total number of credits (Section 2.10.5.1. This way if all RDMA total number of RDMA credits (Section 2.10.5.1. This way if all RDMA
credits are used, the non-RDMA connection can have at least one credits are used, the non-RDMA connection can have at least one
outstanding request. If a server supports multiple transport types, outstanding request. If a server supports multiple transport types,
it MUST allow a client to associate connections from each transport it MUST allow a client to associate connections from each transport
to a channel. to a channel.
It is permissible for a connection of one type of transport to be It is permissible for a connection of one type of transport to be
associated with the fore channel, and a connection of a different associated with the fore channel, and a connection of a different
type to be associated with the backchannel. type to be associated with the backchannel.
2.10.4. Trunking 2.10.4. Trunking
Trunking is the use of multiple connections between a client and Trunking is the use of multiple connections between a client and
server in order to increase the speed of data transfer. NFSv4.1 server in order to increase the speed of data transfer. NFSv4.1
supports two types of trunking: session trunking and client ID supports two types of trunking: session trunking and client ID
trunking. NFSv4.1 servers MUST support trunking. trunking. NFSv4.1 servers MUST support trunking.
Session trunking is essentially the association of multiple Session trunking is essentially the association of multiple
connections, each with a potentially different target network connections, each with potentially different target and/or source
address, to the same session. network addresses, to the same session.
Client ID trunking is the association of multiple sessions to the Client ID trunking is the association of multiple sessions to the
same client ID, major server owner ID (Section 2.5), and server scope same client ID, major server owner ID (Section 2.5), and server scope
(Section 11.7.7). When two servers return the same major server (Section 11.7.7). When two servers return the same major server
owner and server scope it means the two servers are cooperating on owner and server scope it means the two servers are cooperating on
locking state management which is a prerequisite for client ID locking state management which is a prerequisite for client ID
trunking. trunking.
Understanding and distinguishing session and client ID trunking Understanding and distinguishing session and client ID trunking
requires understanding how the results of the EXCHANGE_ID requires understanding how the results of the EXCHANGE_ID
skipping to change at page 44, line 17 skipping to change at page 44, line 30
different EXCHANGE_ID requests, and the eir_clientid, different EXCHANGE_ID requests, and the eir_clientid,
eir_server_owner.so_major_id, eir_server_owner.so_minor_id, and eir_server_owner.so_major_id, eir_server_owner.so_minor_id, and
eir_server_scope results match in both EXCHANGE_ID results, then eir_server_scope results match in both EXCHANGE_ID results, then
the client is permitted to perform session trunking. If the the client is permitted to perform session trunking. If the
client has no session mapping to the tuple of eir_clientid, client has no session mapping to the tuple of eir_clientid,
eir_server_owner.so_major_id, eir_server_scope, eir_server_owner.so_major_id, eir_server_scope,
eir_server_owner.so_minor_id, then it creates the session via a eir_server_owner.so_minor_id, then it creates the session via a
CREATE_SESSION operation over one of the connections, which CREATE_SESSION operation over one of the connections, which
associates the connection to the session. If there is a session associates the connection to the session. If there is a session
for the tuple, the client can send BIND_CONN_TO_SESSION to for the tuple, the client can send BIND_CONN_TO_SESSION to
associate the connection to the session. Or if the client does associate the connection to the session. (Of course, if the
not want to use session trunking, it can invoke CREATE_SESSION on client does not want to use session trunking, it can invoke
the connection. CREATE_SESSION on the connection. This will result in client ID
trunking as described below.)
Client ID Trunking If the eia_clientowner argument is the same in Client ID Trunking If the eia_clientowner argument is the same in
two different EXCHANGE_ID requests, and the eir_clientid, two different EXCHANGE_ID requests, and the eir_clientid,
eir_server_owner.so_major_id, and eir_server_scope results match eir_server_owner.so_major_id, and eir_server_scope results match
in both EXCHANGE_ID results, but the eir_server_owner.so_minor_id in both EXCHANGE_ID results, but the eir_server_owner.so_minor_id
results do not match then the client is permitted to perform results do not match then the client is permitted to perform
client ID trunking. The client can associate each connection with client ID trunking. The client can associate each connection with
different sessions, where each session is associated with the same different sessions, where each session is associated with the same
server. Of course, even if the eir_server_owner.so_minor_id server.
fields do match, the client is free to employ client ID trunking
instead of sessiond trunking. The client completes the act of Of course, even if the eir_server_owner.so_minor_id fields do
client ID trunking by invoking CREATE_SESSION on each connection, match, the client is free to employ client ID trunking instead of
using the same client ID that was returned in eir_clientid. These sessiond trunking.
invocations create two sessions and also associate each connection
with each session. The client completes the act of client ID trunking by invoking
CREATE_SESSION on each connection, using the same client ID that
was returned in eir_clientid. These invocations create two
sessions and also associate each connection with each session.
When doing client ID trunking, locking state is shared across When doing client ID trunking, locking state is shared across
sessions associated with the same client ID. This requires the sessions associated with the same client ID. This requires the
server to coordinate state across sessions. server to coordinate state across sessions.
When two servers over two connections claim matching or partially When two servers over two connections claim matching or partially
matching eir_server_owner, eir_server_scope, and eir_clientid values, matching eir_server_owner, eir_server_scope, and eir_clientid values,
the client does not have to trust the servers' claims. The client the client does not have to trust the servers' claims. The client
may verify these claims before trunking traffic in the following may verify these claims before trunking traffic in the following
ways: ways:
skipping to change at page 45, line 15 skipping to change at page 45, line 33
BIND_CONN_TO_SESSION is to be verified according to the SP4_SSV or BIND_CONN_TO_SESSION is to be verified according to the SP4_SSV or
SP4_MACH_CRED (Section 18.35) state protection options. For SP4_MACH_CRED (Section 18.35) state protection options. For
SP4_SSV, reliable verification depends on a shared secret (the SP4_SSV, reliable verification depends on a shared secret (the
SSV) that is established via the SET_SSV (Section 18.47) SSV) that is established via the SET_SSV (Section 18.47)
operation. operation.
When a new connection is associated with the session (via the When a new connection is associated with the session (via the
BIND_CONN_TO_SESSION operation, see Section 18.34), if the client BIND_CONN_TO_SESSION operation, see Section 18.34), if the client
specified SP4_SSV state protection for the BIND_CONN_TO_SESSION specified SP4_SSV state protection for the BIND_CONN_TO_SESSION
operation, the client MUST send the BIND_CONN_TO_SESSION with operation, the client MUST send the BIND_CONN_TO_SESSION with
RPCSEC_GSS protection, using integrity or privacy, and a RPCSEC_GSS protection, using integrity or privacy, and an
RPCSEC_GSS using the GSS SSV mechanism (Section 2.10.8). The RPCSEC_GSS handle created with the GSS SSV mechanism
RPCSEC_GSS handle is created by CREATE_SESSION (Section 18.36). (Section 2.10.8).
If the client mistakenly tries to associate a connection to a If the client mistakenly tries to associate a connection to a
session of a wrong server, the server will either reject the session of a wrong server, the server will either reject the
attempt because it is not aware of the session identifier of the attempt because it is not aware of the session identifier of the
BIND_CONN_TO_SESSION arguments, or it will reject the attempt BIND_CONN_TO_SESSION arguments, or it will reject the attempt
because the RPCSEC_GSS authentication fails. Even if the server because the RPCSEC_GSS authentication fails. Even if the server
mistakenly or maliciously accepts the connection association mistakenly or maliciously accepts the connection association
attempt, the RPCSEC_GSS verifier it computes in the response will attempt, the RPCSEC_GSS verifier it computes in the response will
not be verified by the client, so the client will know it cannot not be verified by the client, so the client will know it cannot
use the connection for trunking the specified session. use the connection for trunking the specified session.
skipping to change at page 46, line 17 skipping to change at page 46, line 35
client verifies the claim by issuing a CREATE_SESSION to the client verifies the claim by issuing a CREATE_SESSION to the
second destination address, protected with RPCSEC_GSS integrity second destination address, protected with RPCSEC_GSS integrity
using an RPCSEC_GSS handle returned by the second EXCHANGE_ID. If using an RPCSEC_GSS handle returned by the second EXCHANGE_ID. If
the server accepts the CREATE_SESSION request, and if the client the server accepts the CREATE_SESSION request, and if the client
verifies the RPCSEC_GSS verifier and integrity codes, then the verifies the RPCSEC_GSS verifier and integrity codes, then the
client has proof the second server knows the SSV, and thus the two client has proof the second server knows the SSV, and thus the two
servers are the same for the purposes of client ID trunking. servers are the same for the purposes of client ID trunking.
2.10.5. Exactly Once Semantics 2.10.5. Exactly Once Semantics
Via the session, NFSv4.1 offers exactly once semantics (EOS) for Via the session, NFSv4.1 offers Exactly Once Semantics (EOS) for
requests sent over a channel. EOS is supported on both the fore and requests sent over a channel. EOS is supported on both the fore and
back channels. back channels.
Each COMPOUND or CB_COMPOUND request that is sent with a leading Each COMPOUND or CB_COMPOUND request that is sent with a leading
SEQUENCE or CB_SEQUENCE operation MUST be executed by the receiver SEQUENCE or CB_SEQUENCE operation MUST be executed by the receiver
exactly once. This requirement holds regardless of whether the exactly once. This requirement holds regardless of whether the
request is sent with reply caching specified (see request is sent with reply caching specified (see
Section 2.10.5.1.2). The requirement holds even if the requester is Section 2.10.5.1.2). The requirement holds even if the requester is
issuing the request over a session created between a pNFS data client issuing the request over a session created between a pNFS data client
and pNFS data server. To understand the rationale for this and pNFS data server. To understand the rationale for this
skipping to change at page 46, line 44 skipping to change at page 47, line 17
o Idempotent non-modifying requests. o Idempotent non-modifying requests.
An example of a non-idempotent request is RENAME. If is obvious that An example of a non-idempotent request is RENAME. If is obvious that
if a replier executes the same RENAME request twice, and the first if a replier executes the same RENAME request twice, and the first
execution succeeds, the re-execution will fail. If the replier execution succeeds, the re-execution will fail. If the replier
returns the result from the re-execution, this result is incorrect. returns the result from the re-execution, this result is incorrect.
Therefore, EOS is required for nonidempotent requests. Therefore, EOS is required for nonidempotent requests.
An example of an idempotent modifying request is a COMPOUND request An example of an idempotent modifying request is a COMPOUND request
containing a WRITE operation. Repeated execution of the same WRITE containing a WRITE operation. Repeated execution of the same WRITE
has the same effect as execution of that write once. Nevertheless, has the same effect as execution of that write a single time.
enforcing EOS for WRITEs and other idempotent modifying requests is Nevertheless, enforcing EOS for WRITEs and other idempotent modifying
necessary to avoid data corruption. requests is necessary to avoid data corruption.
Suppose a client sends WRITEs A and B to a noncompliant server that Suppose a client sends WRITE A to a noncompliant server that does not
does not enforce EOS, and receives no response, perhaps due to a enforce EOS, and receives no response, perhaps due to a network
network partition. The client reconnects to the server and re-sends partition. The client reconnects to the server and re-sends WRITE A.
both WRITEs. Now, the server has outstanding two instances of each Now, the server has outstanding two instances of A. The server can be
of A and B. The server can be in a situation in which it executes and in a situation in which it executes and replies to the retry of A,
replies to the retries of A and B, while the first A and B are still while the first A is still waiting in the server's internal I/O
waiting in the server's I/O system for some resource. Upon receiving system for some resource. Upon receiving the reply to the second
the replies to the second attempts of WRITEs A and B, the client attempt of WRITE A, the client believes its write is done so it is
believes its writes are done so it is free to send WRITE D which free to send WRITE B which overlaps the range of A. When the original
overlaps the range of one or both of A and B. If A or B are A is dispatched from the server's I/O system, and executed (thus the
subsequently executed for the second time, then what has been written second time A will have been written), then what has been written by
by D can be overwritten and thus corrupted. B can be overwritten and thus corrupted.
An example of an idempotent non-modifying request is a COMPOUND An example of an idempotent non-modifying request is a COMPOUND
containing SEQUENCE, PUTFH, READLINK and nothing else. The re- containing SEQUENCE, PUTFH, READLINK and nothing else. The re-
execution of a such a request will not cause data corruption, or execution of a such a request will not cause data corruption, or
produce an incorrect result. Nonetheless, to keep the implementation produce an incorrect result. Nonetheless, to keep the implementation
simple, the replier MUST enforce EOS for all requests whether simple, the replier MUST enforce EOS for all requests whether
idempotent and non-modifying or not. idempotent and non-modifying or not.
Note that true and complete EOS is not possible unless the server Note that true and complete EOS is not possible unless the server
persists the reply cache in stable storage, unless the server is persists the reply cache in stable storage, unless the server is
skipping to change at page 48, line 21 skipping to change at page 48, line 41
which the request is to be sent. The value of N starts out as equal which the request is to be sent. The value of N starts out as equal
to ca_maxrequests - 1 (Section 18.36), but can be adjusted by the to ca_maxrequests - 1 (Section 18.36), but can be adjusted by the
response to SEQUENCE or CB_SEQUENCE as described later in this response to SEQUENCE or CB_SEQUENCE as described later in this
section. The slot id must be unused by any of the requests which the section. The slot id must be unused by any of the requests which the
requester has already active on the session. "Unused" here means the requester has already active on the session. "Unused" here means the
requester has no outstanding request for that slot id. requester has no outstanding request for that slot id.
A slot contains a sequence id and the cached reply corresponding to A slot contains a sequence id and the cached reply corresponding to
the request sent with that sequence id. The sequence id is a 32 bit the request sent with that sequence id. The sequence id is a 32 bit
unsigned value, and is therefore in the range 0..0xFFFFFFFF (2^32 - unsigned value, and is therefore in the range 0..0xFFFFFFFF (2^32 -
1). The first time a slot is used, the requester must specify a 1). The first time a slot is used, the requester MUST specify a
sequence id of one (1) (Section 18.36). Each time a slot is reused, sequence id of one (1) (Section 18.36). Each time a slot is reused,
the request MUST specify a sequence id that is one greater than that the request MUST specify a sequence id that is one greater than that
of the previous request on the slot. If the previous sequence id was of the previous request on the slot. If the previous sequence id was
0xFFFFFFFF, then the next request for the slot MUST have the sequence 0xFFFFFFFF, then the next request for the slot MUST have the sequence
id set to zero (i.e. (2^32 - 1) + 1 mod 2^32). id set to zero (i.e. (2^32 - 1) + 1 mod 2^32).
The sequence id accompanies the slot id in each request. It is for The sequence id accompanies the slot id in each request. It is for
the critical check at the server: it used to efficiently determine the critical check at the server: it used to efficiently determine
whether a request using a certain slot id is a retransmit or a new, whether a request using a certain slot id is a retransmit or a new,
never-before-seen request. It is not feasible for the client to never-before-seen request. It is not feasible for the client to
skipping to change at page 49, line 30 skipping to change at page 49, line 48
session, the replier need only cache the results of a limited number session, the replier need only cache the results of a limited number
of COMPOUND requests . The second implication derives from the of COMPOUND requests . The second implication derives from the
first, which is unlike XID-indexed reply caches (also known as first, which is unlike XID-indexed reply caches (also known as
duplicate request caches - DRCs), the slot id-based reply cache duplicate request caches - DRCs), the slot id-based reply cache
cannot be overflowed. Through use of the sequence id to identify cannot be overflowed. Through use of the sequence id to identify
retransmitted requests, the replier does not need to actually cache retransmitted requests, the replier does not need to actually cache
the request itself, reducing the storage requirements of the reply the request itself, reducing the storage requirements of the reply
cache further. These facilities make it practical to maintain all cache further. These facilities make it practical to maintain all
the required entries for an effective reply cache. the required entries for an effective reply cache.
The slot id and sequence id therefore take over the traditional role The slot id, sequence id, and sessionid therefore take over the
of the XID and source network address in the replier's reply cache traditional role of the XID and source network address in the
implementation. This approach is considerably more portable and replier's reply cache implementation. This approach is considerably
completely robust - it is not subject to the reassignment of ports as more portable and completely robust - it is not subject to the
clients reconnect over IP networks. In addition, the RPC XID is not reassignment of ports as clients reconnect over IP networks. In
used in the reply cache, enhancing robustness of the cache in the addition, the RPC XID is not used in the reply cache, enhancing
face of any rapid reuse of XIDs by the requester. While the replier robustness of the cache in the face of any rapid reuse of XIDs by the
does not care about the XID for the purposes of reply cache requester. While the replier does not care about the XID for the
management (but the replier MUST return the same XID that was in the purposes of reply cache management (but the replier MUST return the
request), nonetheless there are considerations for the XID in NFSv4.1 same XID that was in the request), nonetheless there are
that are the same as all other previous versions of NFS. The RPC XID considerations for the XID in NFSv4.1 that are the same as all other
remains in each message and must be formulated in NFSv4.1 requests as previous versions of NFS. The RPC XID remains in each message and
it any other ONC RPC request. The reasons include: must be formulated in NFSv4.1 requests as in any other ONC RPC
request. The reasons include:
o The RPC layer retains its existing semantics and implementation. o The RPC layer retains its existing semantics and implementation.
o The requester and replier must be able to interoperate at the RPC o The requester and replier must be able to interoperate at the RPC
layer, prior to the NFSv4.1 decoding of the SEQUENCE or layer, prior to the NFSv4.1 decoding of the SEQUENCE or
CB_SEQUENCE operation CB_SEQUENCE operation.
o If an operation is being used that does not start with SEQUENCE or o If an operation is being used that does not start with SEQUENCE or
CB_SEQUENCE (e.g. BIND_CONN_TO_SESSION), then the RPC XID is CB_SEQUENCE (e.g. BIND_CONN_TO_SESSION), then the RPC XID is
needed for correct operation to match the reply to the request. needed for correct operation to match the reply to the request.
o The SEQUENCE or CB_SEQUENCE operation may generate an error. If o The SEQUENCE or CB_SEQUENCE operation may generate an error. If
so, the embedded slot id, sequence id, and sessionid (if present) so, the embedded slot id, sequence id, and sessionid (if present)
in the request will not be in the reply, and the requester has in the request will not be in the reply, and the requester has
only the XID to match the reply to the request. only the XID to match the reply to the request.
Givem that well formulated XIDs continue to be required, this begs Given that well formulated XIDs continue to be required, this begs
the question why SEQUENCE and CB_SEQUENCE replies have a sessionid, the question why SEQUENCE and CB_SEQUENCE replies have a sessionid,
slot id and sequence id? Having the sessionid in the reply means the slot id and sequence id? Having the sessionid in the reply means the
requester does not have to use the XID to lookup the sessionid, which requester does not have to use the XID to lookup the sessionid, which
would be necessary if the connection were associated with multiple would be necessary if the connection were associated with multiple
sessions. Having the slot id and sequence id in the reply means sessions. Having the slot id and sequence id in the reply means
requester does not have to use the XID to lookup the slot id and requester does not have to use the XID to lookup the slot id and
sequence id. Furhermore, since the XID is only 32 bits, it is too sequence id. Furhermore, since the XID is only 32 bits, it is too
small to guarantee the re-association of a reply with its request small to guarantee the re-association of a reply with its request
([27]); having sessionid, slot id, and sequence id in the reply ([27]); having sessionid, slot id, and sequence id in the reply
allows the client to validate that the reply in fact belongs to the allows the client to validate that the reply in fact belongs to the
matched request. matched request.
The SEQUENCE (and CB_SEQUENCE) operation also carries a The SEQUENCE (and CB_SEQUENCE) operation also carries a
"highest_slotid" value which carries additional requester slot usage "highest_slotid" value which carries additional requester slot usage
information. The requester must always provide a slot id information. The requester must always indicate the slot id
representing the outstanding request with the highest-numbered slot representing the outstanding request with the highest-numbered slot
value. The requester should in all cases provide the most value. The requester should in all cases provide the most
conservative value possible, although it can be increased somewhat conservative value possible, although it can be increased somewhat
above the actual instantaneous usage to maintain some minimum or above the actual instantaneous usage to maintain some minimum or
optimal level. This provides a way for the requester to yield unused optimal level. This provides a way for the requester to yield unused
request slots back to the replier, which in turn can use the request slots back to the replier, which in turn can use the
information to reallocate resources. information to reallocate resources.
The replier responds with both a new target highest_slotid, and an The replier responds with both a new target highest_slotid, and an
enforced highest_slotid, described as follows: enforced highest_slotid, described as follows:
skipping to change at page 51, line 19 skipping to change at page 51, line 39
even though the replier knows there are no outstanding requests a even though the replier knows there are no outstanding requests a
higher slot ids, it MAY take more forceful action. When faced higher slot ids, it MAY take more forceful action. When faced
with intransigence, the replier MAY reply with a new enforced with intransigence, the replier MAY reply with a new enforced
highest_slotid that is less than its previous enforced highest_slotid that is less than its previous enforced
highest_slotid. Thereafter, if the requester continues to send highest_slotid. Thereafter, if the requester continues to send
requests with a highest_slotid that is greater than the replier's requests with a highest_slotid that is greater than the replier's
new enforced highest_slotid the server MAY return new enforced highest_slotid the server MAY return
NFS4ERR_BAD_HIGHSLOT, unless the slot id in the request is greater NFS4ERR_BAD_HIGHSLOT, unless the slot id in the request is greater
than the new enforced highest_slotid, and the request is a retry. than the new enforced highest_slotid, and the request is a retry.
The replier SHOULD keep slots it wants to retire around until the The replier SHOULD retain the slots it wants to retire until the
requester sends a request with a highest_slotid less than or equal requester sends a request with a highest_slotid less than or equal
to the replier's new enforced highest_slotid. Also a request with to the replier's new enforced highest_slotid. Also if a request
a slot that is higher than the new enforced highest_slotid can be is received with a slot that is higher than the new enforced
retired if the requester specifies a sequence id that is not equal highest_slotid, and the sequence id is one higher than what is in
what is in the slot's reply cache. In other words, once the the slot's reply cache, then the server can both retire the slot
replier has forcibly lowered the enforced highest_slotid, the and return NFS4ERR_BADSLOT (however the server MUST NOT do one and
not the other). (The reason it is safe to retire the slot is
because that by using the next sequenceid, the client is
indicating it has received the previous reply for the slot.) Once
the replier has forcibly lowered the enforced highest_slotid, the
requester is only allowed to send retries to the to-be-retired requester is only allowed to send retries to the to-be-retired
slots. slots.
o The requester SHOULD use the lowest available slot when issuing a o The requester SHOULD use the lowest available slot when issuing a
new request. This way, the replier may be able to retire slot new request. This way, the replier may be able to retire slot
entries faster. However, where the replier is actively adjusting entries faster. However, where the replier is actively adjusting
its granted highest_slotid, it will not not be able to use only its granted highest_slotid, it will not be able to use only the
the receipt of the slot id and highest_slotid in the request. receipt of the slot id and highest_slotid in the request. Neither
Neither the slot id nor the highest_slotid used in a request may the slot id nor the highest_slotid used in a request may reflect
reflect the replier's current idea of the requester's session the replier's current idea of the requester's session limit,
limit, because the request may have been sent from the requester because the request may have been sent from the requester before
before the update was received. Therefore, in the downward the update was received. Therefore, in the downward adjustment
adjustment case, the replier may have to retain a number of reply case, the replier may have to retain a number of reply cache
cache entries at least as large as the old value of maximum entries at least as large as the old value of maximum requests
requests outstanding, until it can infer that the requester has outstanding, until it can infer that the requester has seen a
seen a reply containing the new granted highest_slotid. The reply containing the new granted highest_slotid. The replier can
replier can infer that requester as seen such a reply when it infer that requester as seen such a reply when it receives a new
receives a new request with the same slotid as the request replied request with the same slotid as the request replied to and the
to and the next higher sequenceid. next higher sequenceid.
2.10.5.1.1. Errors from SEQUENCE and CB_SEQUENCE 2.10.5.1.1. Errors from SEQUENCE and CB_SEQUENCE
Any time SEQUENCE or CB_SEQUENCE return an error, the sequence id of Any time SEQUENCE or CB_SEQUENCE return an error, the sequence id of
the slot MUST NOT change. The replier MUST NOT modify the reply the slot MUST NOT change. The replier MUST NOT modify the reply
cache entry for the slot whenever an error is returned from SEQUENCE cache entry for the slot whenever an error is returned from SEQUENCE
or CB_SEQUENCE. or CB_SEQUENCE.
2.10.5.1.2. Optional Reply Caching 2.10.5.1.2. Optional Reply Caching
skipping to change at page 53, line 14 skipping to change at page 53, line 39
destroy the session). destroy the session).
Note that it is not fatal for a client to retry without a disconnect Note that it is not fatal for a client to retry without a disconnect
between the request and retry. However the retry does consume between the request and retry. However the retry does consume
resources, especially with RDMA, where each request, retry or not, resources, especially with RDMA, where each request, retry or not,
consumes a credit. Retries for no reason, especially retries sent consumes a credit. Retries for no reason, especially retries sent
shortly after the previous attempt, are a poor use of network shortly after the previous attempt, are a poor use of network
bandwidth and defeat the purpose of a transport's inherent congestion bandwidth and defeat the purpose of a transport's inherent congestion
control system. control system.
A client MUST wait for a reply to a request before using the slot for A requester MUST wait for a reply to a request before using the slot
another request. If it does not wait for a reply, then the client for another request. If it does not wait for a reply, then the
does not know what sequence id to use for the slot on its next requester does not know what sequence id to use for the slot on its
request. For example, suppose a client sends a request with sequence next request. For example, suppose a requester sends a request with
id 1, and does not wait for the response. The next time it uses the sequence id 1, and does not wait for the response. The next time it
slot, it sends the new request with sequence id 2. If the server has uses the slot, it sends the new request with sequence id 2. If the
not seen the request with sequence id 1, then the server is not replier has not seen the request with sequence id 1, then the replier
expecting sequence id 2, and rejects the client's new request with is not expecting sequence id 2, and rejects the requester's new
NFS4ERR_SEQ_MISORDERED (as the result from SEQUENCE or CB_SEQUENCE). request with NFS4ERR_SEQ_MISORDERED (as the result from SEQUENCE or
CB_SEQUENCE).
RDMA fabrics do not guarantee that the memory handles (Steering Tags) RDMA fabrics do not guarantee that the memory handles (Steering Tags)
within each RPC/RDMA "chunk" ([8]) are valid on a scope outside that within each RPC/RDMA "chunk" ([8]) are valid on a scope outside that
of a single connection. Therefore, handles used by the direct of a single connection. Therefore, handles used by the direct
operations become invalid after connection loss. The server must operations become invalid after connection loss. The server must
ensure that any RDMA operations which must be replayed from the reply ensure that any RDMA operations which must be replayed from the reply
cache use the newly provided handle(s) from the most recent request. cache use the newly provided handle(s) from the most recent request.
A retry might be sent while the original request is still in progress A retry might be sent while the original request is still in progress
on the replier. The replier SHOULD deal with the issue by by on the replier. The replier SHOULD deal with the issue by returning
returning NFS4ERR_DELAY as the reply to SEQUENCE or CB_SEQUENCE NFS4ERR_DELAY as the reply to SEQUENCE or CB_SEQUENCE operation, but
operation, but implementations MAY return NFS4ERR_MISORDERED. Since implementations MAY return NFS4ERR_MISORDERED. Since errors from
errors from SEQUENCE and CB_SEQUENCE are never recorded in the reply SEQUENCE and CB_SEQUENCE are never recorded in the reply cache, this
cache, this approach allows the results of the execution of the approach allows the results of the execution of the original request
original request to be properly recorded in the reply cache (assuming to be properly recorded in the reply cache (assuming the requester
the requester specified the reply to be cached). specified the reply to be cached).
2.10.5.3. Resolving Server Callback Races 2.10.5.3. Resolving Server Callback Races
It is possible for server callbacks to arrive at the client before It is possible for server callbacks to arrive at the client before
the reply from related fore channel operations. For example, a the reply from related fore channel operations. For example, a
client may have been granted a delegation to a file it has opened, client may have been granted a delegation to a file it has opened,
but the reply to the OPEN (informing the client of the granting of but the reply to the OPEN (informing the client of the granting of
the delegation) may be delayed in the network. If a conflicting the delegation) may be delayed in the network. If a conflicting
operation arrives at the server, it will recall the delegation using operation arrives at the server, it will recall the delegation using
the backchannel, which may be on a different transport connection, the backchannel, which may be on a different transport connection,
skipping to change at page 54, line 46 skipping to change at page 55, line 24
CB_COMPOUND procedure. CB_COMPOUND procedure.
The client must not simply wait forever for the expected server reply The client must not simply wait forever for the expected server reply
to arrive before responding to the CB_COMPOUND that won the race, to arrive before responding to the CB_COMPOUND that won the race,
because it is possible that it will be delayed indefinitely. The because it is possible that it will be delayed indefinitely. The
client should assume the likely case that the reply will arrive client should assume the likely case that the reply will arrive
within the average round trip time for COMPOUND requests to the within the average round trip time for COMPOUND requests to the
server, and wait that period of time. If that period of time expires server, and wait that period of time. If that period of time expires
it can respond to the CB_COMPOUND with NFS4ERR_DELAY. it can respond to the CB_COMPOUND with NFS4ERR_DELAY.
There are other scenarios under which callbacks may race replies, There are other scenarios under which callbacks may race replies.
among them pNFS layout recalls, described in Section 12.5.5.2. Among them are pNFS layout recalls as described in Section 12.5.5.2.
2.10.5.4. COMPOUND and CB_COMPOUND Construction Issues 2.10.5.4. COMPOUND and CB_COMPOUND Construction Issues
Very large requests and replies may pose both buffer management Very large requests and replies may pose both buffer management
issues (especially with RDMA) and reply cache issues. When the issues (especially with RDMA) and reply cache issues. When the
session is created, (Section 18.36), for each channel (fore and session is created, (Section 18.36), for each channel (fore and
back), the client and server negotiate the maximum sized request they back), the client and server negotiate the maximum sized request they
will send or process (ca_maxrequestsize), the maximum sized reply will send or process (ca_maxrequestsize), the maximum sized reply
they will return or process (ca_maxresponsesize), and the maximum they will return or process (ca_maxresponsesize), and the maximum
sized reply they will store in the reply cache sized reply they will store in the reply cache
skipping to change at page 57, line 19 skipping to change at page 57, line 42
any requests that were sent and executed before the server restarted. any requests that were sent and executed before the server restarted.
If the replier is a client then there is no need for it to persist If the replier is a client then there is no need for it to persist
any more information, unless the client will be persisting all other any more information, unless the client will be persisting all other
state across client restart. In which case, the server will never state across client restart. In which case, the server will never
see any NFSv4.1-level protocol manifestation of a client restart. If see any NFSv4.1-level protocol manifestation of a client restart. If
the replier is a server, with just the slot table and sessionid the replier is a server, with just the slot table and sessionid
persisting, any requests the client retries after the server restart persisting, any requests the client retries after the server restart
will return the results that are cached in reply cache. and any new will return the results that are cached in reply cache. and any new
requests (i.e. the sequence id is one (1) greater than the slot's requests (i.e. the sequence id is one (1) greater than the slot's
sequence id) MUST be rejected with NFS4ERR_DEADSESSION (returned by sequence id) MUST be rejected with NFS4ERR_DEADSESSION (returned by
SEQUENCE). Such a session is considered: dead. A server MAY re- SEQUENCE). Such a session is considered dead. A server MAY re-
animate a session after a server restart so that the session will animate a session after a server restart so that the session will
accept new requests as well as retries. To re-animate a session the accept new requests as well as retries. To re-animate a session the
server needs to persist additional information through server server needs to persist additional information through server
restart: restart:
o The client ID. This is a prerequisite to let the client to create o The client ID. This is a prerequisite to let the client to create
more sessions associated with the same client ID as the more sessions associated with the same client ID as the
o The client ID's sequenceid that is used for creating sessions (see o The client ID's sequenceid that is used for creating sessions (see
Section 18.35 and Section 18.36. This is a prerequisite to let Section 18.35 and Section 18.36. This is a prerequisite to let
the client create more sessions. the client create more sessions.
o The principal that created the client ID. This allows the server o The principal that created the client ID. This allows the server
to authenticate the client when it sends EXCHANGE_ID. to authenticate the client when it sends EXCHANGE_ID.
o The SSV, if SP4_SSV state protection was specified when the client o The SSV, if SP4_SSV state protection was specified when the client
ID was created (see Section 18.35). This lets the client create ID was created (see Section 18.35). This lets the client create
new sessions, and associate connections with the new and existing new sessions, and associate connections with the new and existing
skipping to change at page 63, line 12 skipping to change at page 63, line 40
Assuming a proper safe guard, using the per-machine credential for Assuming a proper safe guard, using the per-machine credential for
operations like CREATE_SESSION, BIND_CONN_TO_SESSION, operations like CREATE_SESSION, BIND_CONN_TO_SESSION,
DESTROY_SESSION, and DESTROY_CLIENTID will prevent an attacker from DESTROY_SESSION, and DESTROY_CLIENTID will prevent an attacker from
associating a rogue connection with a session, or associating a rogue associating a rogue connection with a session, or associating a rogue
session with a client ID. session with a client ID.
There are at least three scenarios for the SP4_MACH_CRED option: There are at least three scenarios for the SP4_MACH_CRED option:
1. That the system administrator configures a unique, permanent per- 1. That the system administrator configures a unique, permanent per-
machine credential for one of the mandated GSS mechanisms (for machine credential for one of the mandated GSS mechanisms (for
example, if Kerberos V5 is used, a "keytab" for principal named example, if Kerberos V5 is used, a "keytab" containing a
after client host name could be used). principal named after client host name could be used).
2. The client is used by a single user, and so the client ID and its 2. The client is used by a single user, and so the client ID and its
sessions are used by just that user. If the user's credential sessions are used by just that user. If the user's credential
expires, then session and client ID maintenance cannot occur, but expires, then session and client ID maintenance cannot occur, but
since the client has a single user, only that user is since the client has a single user, only that user is
inconvenienced. inconvenienced.
3. The physical client has multiple users, but the client 3. The physical client has multiple users, but the client
implementation has a unique client ID for each user. This is implementation has a unique client ID for each user. This is
effectively the same as the second scenario, but a disadvantage effectively the same as the second scenario, but a disadvantage
skipping to change at page 64, line 51 skipping to change at page 65, line 30
because the client SHOULD have modified the SSV due to Eve using because the client SHOULD have modified the SSV due to Eve using
the new session, Bob cannot get revenge on Eve by associating a the new session, Bob cannot get revenge on Eve by associating a
rogue connection with the session. rogue connection with the session.
The question is how did the legitimate client detect that Eve has The question is how did the legitimate client detect that Eve has
hijacked the old session? When the client detects that a new hijacked the old session? When the client detects that a new
principal, Bob, wants to use the session, it SHOULD have sent a principal, Bob, wants to use the session, it SHOULD have sent a
SET_SSV, which leads to following sub-scenarios: SET_SSV, which leads to following sub-scenarios:
* Let us suppose that from the rogue connection, Eve sent a * Let us suppose that from the rogue connection, Eve sent a
SET_SSV with the same slot id and sequence that the legitimate SET_SSV with the same slot id and sequence id that the
client later uses. The server will assume this is a retry, and legitimate client later uses. The server will assume the
return to the legitimate client the reply it sent Eve. However, SET_SSV sent with Bob's credentials is a retry, and return to
unless Eve can correctly guess the SSV the legitimate client the legitimate client the reply it sent Eve. However, unless
will use, the digest verification checks in the SET_SSV Eve can correctly guess the SSV the legitimate client will use,
response will fail. That is an indication to the client that the digest verification checks in the SET_SSV response will
the session has apparently been hijacked. fail. That is an indication to the client that the session has
apparently been hijacked.
* Alternatively, Eve sent a SET_SSV with a different slot id than * Alternatively, Eve sent a SET_SSV with a different slot id than
the legitimate client uses for its SET_SSV. Then the digest the legitimate client uses for its SET_SSV. Then the digest
verification on the server fails, and it is again apparent to verification of the SET_SSV send with Bob's credentials fails
the client that the session has been hijacked. on the server fails, and the error returned to the client makes
it apparent that the session has been hijacked.
* Alternatively, Eve sent an operation other than SET_SSV, but * Alternatively, Eve sent an operation other than SET_SSV, but
with the same slot id and sequence that the legitimate client with the same slot id and sequence that the legitimate client
uses for its SET_SSV. The server returns to the legitimate uses for its SET_SSV. The server returns to the legitimate
client the response it sent Eve. The client sees that the client the response it sent Eve. The client sees that the
response is not at all what it expects. The client assumes response is not at all what it expects. The client assumes
either session hijacking or a server bug, and either way either session hijacking or a server bug, and either way
destroys the old session. destroys the old session.
o Eve associates a rogue connection with the session as above, and o Eve associates a rogue connection with the session as above, and
then destroys the session. Again, Bob goes to use the server from then destroys the session. Again, Bob goes to use the server from
the legitimate client by issuing a SET_SSV. The client receives the legitimate client, which sends a SET_SSV using Bob's
an error that indicates the session does not exist. When the credentials. The client receives an error that indicates the
client tries to create a new session, this will fail because the session does not exist. When the client tries to create a new
SSV it has does not that the server has, and now the client knows session, this will fail because the SSV it has does not match that
the session was hijacked. The legitimate client establishes a new the server has, and now the client knows the session was hijacked.
client ID as before. The legitimate client establishes a new client ID as before.
o If Eve creates a connection before the legitimate client o If Eve creates a connection before the legitimate client
establishes an SSV, because the initial value of the SSV is zero establishes an SSV, because the initial value of the SSV is zero
and therefore known, Eve can send a SET_SSV that will pass the and therefore known, Eve can send a SET_SSV that will pass the
digest verification check. However because the new connection has digest verification check. However because the new connection has
not been associated with the session, the SET_SSV is rejected for not been associated with the session, the SET_SSV is rejected for
that reason. that reason.
In summary an attacker's disruption of state when SP4_SSV protection In summary, an attacker's disruption of state when SP4_SSV protection
is in use is limited to the formative period of a client ID, its is in use is limited to the formative period of a client ID, its
first session, and the establishment of the SSV. Once a non- first session, and the establishment of the SSV. Once a non-
malicious user uses the client ID, the client quickly detects any malicious user uses the client ID, the client quickly detects any
hijack and rectifies the situation. Once a non-malicious user hijack and rectifies the situation. Once a non-malicious user
successfully modifies the SSV, the attacker cannot use NFSv4.1 successfully modifies the SSV, the attacker cannot use NFSv4.1
operations to disrupt the non-malicious user. operations to disrupt the non-malicious user.
Note that neither the SP4_MACH_CRED nor SP4_SSV protection approaches Note that neither the SP4_MACH_CRED nor SP4_SSV protection approaches
prevent hijacking of a transport connection that has previously been prevent hijacking of a transport connection that has previously been
associated with a session. If the goal of a counter threat strategy associated with a session. If the goal of a counter threat strategy
is to prevent connection hijacking, the use of IPsec is RECOMMENDED. is to prevent connection hijacking, the use of IPsec is RECOMMENDED.
If the goal of a counter threat strategy is to prevent a connection If a connection hijack occurs, the hijacker could in theory change
hijacker from making unauthorized state changes, then the locking state and negatively impact the service to legitimate
SP4_MACH_CRED protection approach can be used with a client ID per clients. However if the server is configured to require the use of
user (i.e. the aforementioned third scenario for machine credential RPCSEC_GSS with integrity or privacy on the affected file objects,
state protection). For each unique user, the client invokes and if EXCHGID4_FLAG_BIND_PRINC_STATEID capability (Section 18.35),
EXCHANGE_ID with the user's credential, specifying SP4_MACH_CRED is in force, this will thwart unauthorized attempts to change locking
protections, and specifying that all operations MUST be protected state.
with the machine credential. The server will then reject any
subsequent operations on the client ID or its sessions that do not
use RPCSEC_GSS with privacy or integrity and do not use the same
credential that created the client ID.
2.10.8. The SSV GSS Mechanism 2.10.8. The SSV GSS Mechanism
The SSV provides the secret key for a mechanism that NFSv4.1 uses for The SSV provides the secret key for a mechanism that NFSv4.1 uses for
state protection. Contexts for this mechanism are not established state protection. Contexts for this mechanism are not established
via the RPCSEC_GSS protocol. Instead, the contexts are automatically via the RPCSEC_GSS protocol. Instead, the contexts are automatically
created when EXCHANGE_ID specifies SP4_SSV protection. The only created when EXCHANGE_ID specifies SP4_SSV protection. The only
tokens defined are the PerMsgToken (emitted by GSS_GetMIC) and the tokens defined are the PerMsgToken (emitted by GSS_GetMIC) and the
SealedMessage (emitted by GSS_Wrap). SealedMessage token (emitted by GSS_Wrap).
The mechanism OID for the SSV mechanism is: The mechanism OID for the SSV mechanism is:
iso.org.dod.internet.private.enterprise.Michael Eisler.nfs.ssv_mech iso.org.dod.internet.private.enterprise.Michael Eisler.nfs.ssv_mech
(1.3.6.1.4.1.28882.1.1). While the SSV mechanisms does not define (1.3.6.1.4.1.28882.1.1). While the SSV mechanism does not define any
any initial context tokens, the OID can be used to let servers initial context tokens, the OID can be used to let servers indicate
indicate that the SSV mechanism is acceptable whenever the client that the SSV mechanism is acceptable whenever the client sends a
sends a SECINFO or SECINFO_NO_NAME operation (see Section 2.6). SECINFO or SECINFO_NO_NAME operation (see Section 2.6).
The SSV mechanism defines four subkeys derived from the SSV value. The SSV mechanism defines four subkeys derived from the SSV value.
Each time SET_SSV is invoked the subkeys are recalculated by the Each time SET_SSV is invoked the subkeys are recalculated by the
client and server. The four subkeys are calculated by from each of client and server. The calculation of each of the four subkeys
the valid ssv_subkey4 enumerated values. The calculation uses the depends on each of the four respective ssv_subkey4 enumerated values.
HMAC ([11]), algorithm, using the current SSV as the key, the one way The calculation uses the HMAC [11], algorithm, using the current SSV
hash algorithm as negotiated by EXCHANGE_ID, and the input text as as the key, the one way hash algorithm as negotiated by EXCHANGE_ID,
represented by the XDR encoded enumeration of type ssv_subkey4. and the input text as represented by the XDR encoded enumeration of
type ssv_subkey4.
/* Input for computing subkeys */ /* Input for computing subkeys */
enum ssv_subkey4 { enum ssv_subkey4 {
SSV4_SUBKEY_MIC_I2T = 1, SSV4_SUBKEY_MIC_I2T = 1,
SSV4_SUBKEY_MIC_T2I = 2, SSV4_SUBKEY_MIC_T2I = 2,
SSV4_SUBKEY_SEAL_I2T = 3, SSV4_SUBKEY_SEAL_I2T = 3,
SSV4_SUBKEY_SEAL_T2I = 4 SSV4_SUBKEY_SEAL_T2I = 4
}; };
The subkey derived from SSV4_SUBKEY_MIC_I2T is used for calculating The subkey derived from SSV4_SUBKEY_MIC_I2T is used for calculating
message integrity codes (MICs) that originate from the NFSv4.1 message integrity codes (MICs) that originate from the NFSv4.1
client, whether as part of a request over the fore channel, or a client, whether as part of a request over the fore channel, or a
response over the backchannel. The subkey derived from SSV4_SUBKEY- response over the backchannel. The subkey derived from SSV4_SUBKEY-
MIST2I is used for MICs originating from the NFSv4.1 server. The MIST2I is used for MICs originating from the NFSv4.1 server. The
subkey derived from SSV4_SUBKEY_SEAL_I2T is used for encryption text subkey derived from SSV4_SUBKEY_SEAL_I2T is used for encryption text
originating from the NFSv4.1 client and the subkey derived from originating from the NFSv4.1 client and the subkey derived from
SSV4_SUBKEY_SEAL_T2I is used for encryption text originating from the SSV4_SUBKEY_SEAL_T2I is used for encryption text originating from the
NFSv4.1 server. NFSv4.1 server.
The field smt_hmac is an HMAC calculated by using the subkey derived
from SSV4_SUBKEY_MIC_I2T or SSV4_SUBKEY_MIC_T2I as the key, the one
way hash algorithm as negotiated by EXCHANGE_ID, and the input text
as represented by data of type ssv_mic_plain_tkn4. The field
smpt_ssv_seq is the same as smt_ssv_seq. The field smt_orig_plain is
the input text as passed into GSS_GetMIC().
The PerMsgToken description is based on an XDR definition: The PerMsgToken description is based on an XDR definition:
/* Input for computing smt_hmac */ /* Input for computing smt_hmac */
struct ssv_mic_plain_tkn4 { struct ssv_mic_plain_tkn4 {
uint32_t smpt_ssv_seq; uint32_t smpt_ssv_seq;
opaque smpt_orig_plain<>; opaque smpt_orig_plain<>;
}; };
/* SSV GSS PerMsgToken token */ /* SSV GSS PerMsgToken token */
struct ssv_mic_tkn4 { struct ssv_mic_tkn4 {
uint32_t smt_ssv_seq; uint32_t smt_ssv_seq;
opaque smt_hmac<>; opaque smt_hmac<>;
}; };
The field smt_hmac is an HMAC calculated by using the subkey derived
from SSV4_SUBKEY_MIC_I2T or SSV4_SUBKEY_MIC_T2I as the key, the one
way hash algorithm as negotiated by EXCHANGE_ID, and the input text
as represented by data of type ssv_mic_plain_tkn4. The field
smpt_ssv_seq is the same as smt_ssv_seq. The field smpt_orig_plain
is the "message" input passed to GSS_GetMIC() (see Section 2.3.1 of
[7]). The caller of GSS_GetMIC() provides a pointer to a buffer
containing the plain text. The SSV mechanism's entry point for
GSS_GetMIC() encodes this into an opaque array, and the encoding will
include an initial four byte length, plus any necessary padding.
Prepended to this will be the XDR encoded value of smpt_ssv_seq thus
making up an XDR encoding of a value of data type ssv_mic_plain_tkn4,
which in turn is the input into the HMAC.
The token emitted by GSS_GetMIC() is XDR encoded and of XDR data type The token emitted by GSS_GetMIC() is XDR encoded and of XDR data type
ssv_mic_tkn4. The field smt_ssv_seq comes from the SSV sequence ssv_mic_tkn4. The field smt_ssv_seq comes from the SSV sequence
number which is equal to 1 after SET_SSV (Section 18.47) is called number which is equal to 1 after SET_SSV (Section 18.47) is called
the first time on a client ID. Thereafter, it is incremented on each the first time on a client ID. Thereafter, it is incremented on each
SET_SSV. Thus smt_ssv_seq represents the version of the SSV at the SET_SSV. Thus smt_ssv_seq represents the version of the SSV at the
time GSS_GetMIC() was called. As noted in Section 18.35, the client time GSS_GetMIC() was called. As noted in Section 18.35, the client
and server can maintain multiple concurrent versions of the SSV. and server can maintain multiple concurrent versions of the SSV.
This allows the SSV to be changed without serializing all RPC calls This allows the SSV to be changed without serializing all RPC calls
that use the SSV mechanism with SET_SSV operations. that use the SSV mechanism with SET_SSV operations. Once the HMAC is
calculated, it is XDR encoded into smt_hmac, which will include an
initial four byte length, and any necessary padding. Prepended to
this will be the XDR encoded value of smt_ssv_seq.
The SealedMessage description is based on an XDR definition: The SealedMessage description is based on an XDR definition:
/* Input for computing ssct_encr_data and ssct_hmac */ /* Input for computing ssct_encr_data and ssct_hmac */
struct ssv_seal_plain_tkn4 { struct ssv_seal_plain_tkn4 {
opaque sspt_confounder<>; opaque sspt_confounder<>;
uint32_t sspt_ssv_seq; uint32_t sspt_ssv_seq;
opaque sspt_orig_plain<>; opaque sspt_orig_plain<>;
opaque sspt_pad<>; opaque sspt_pad<>;
}; };
skipping to change at page 69, line 7 skipping to change at page 69, line 42
The ssct_hmac field is the result of computing an HMAC using value of The ssct_hmac field is the result of computing an HMAC using value of
the XDR encoded data type ssv_seal_plain_tkn4 as the input text. The the XDR encoded data type ssv_seal_plain_tkn4 as the input text. The
key is the subkey derived from SSV4_SUBKEY_MIC_I2T or key is the subkey derived from SSV4_SUBKEY_MIC_I2T or
SSV4_SUBKEY_MIC_T2I, and the one way hash algorithm is that SSV4_SUBKEY_MIC_T2I, and the one way hash algorithm is that
negotiated by EXCHANGE_ID. negotiated by EXCHANGE_ID.
The sspt_confounder field is a random value. The sspt_confounder field is a random value.
The sspt_ssv_seq field is the same as ssvt_ssv_seq. The sspt_ssv_seq field is the same as ssvt_ssv_seq.
The sspt_orig_plain field is the original plaintext as passed to The field sspt_orig_plain field is the original plaintext and is the
GSS_Wrap(). "input_message" input passed to GSS_Wrap() (see Section 2.3.3 of
[7]). As with the handling of the plaintext by the SSV mechanism's
GSS_GetMIC() entry point, the entry point for GSS_Wrap() expects a
pointer to the plaintext, and will XDR encode an opaque array into
sspt_orig_plain representing the plain text, along with the other
fields of an instance of data type ssv_seal_plain_tkn4.
The sspt_pad field is present to support encryption algorithms that The sspt_pad field is present to support encryption algorithms that
require inputs to be in fixed sized blocks. The content of sspt_pad require inputs to be in fixed sized blocks. The content of sspt_pad
is zero filled except for the length. Beware that the XDR encoding is zero filled except for the length. Beware that the XDR encoding
of ssv_seal_plain_tkn4 contains three variable length arrays, and so of ssv_seal_plain_tkn4 contains three variable length arrays, and so
each array consumes four bytes for an array length, and each array each array consumes four bytes for an array length, and each array
that follows the length is always padded to a multiple of four bytes that follows the length is always padded to a multiple of four bytes
per the XDR standard. per the XDR standard.
For example suppose the encryption algorithm uses 16 byte blocks, and For example suppose the encryption algorithm uses 16 byte blocks, and
skipping to change at page 69, line 36 skipping to change at page 70, line 32
or a total encoding of 16 bytes. The total number of XDR encoded or a total encoding of 16 bytes. The total number of XDR encoded
bytes is thus 8 + 4 + 20 + 16 = 48. bytes is thus 8 + 4 + 20 + 16 = 48.
GSS_Wrap() emits a token that is an XDR encoding of a value of data GSS_Wrap() emits a token that is an XDR encoding of a value of data
type ssv_seal_cipher_tkn4. Note that regardless whether the caller type ssv_seal_cipher_tkn4. Note that regardless whether the caller
of GSS_Wrap() requests confidentiality or not, the token always has of GSS_Wrap() requests confidentiality or not, the token always has
confidentiality. This is because the SSV mechanism is for confidentiality. This is because the SSV mechanism is for
RPCSEC_GSS, and RPCSEC_GSS never produces GSS_wrap() tokens without RPCSEC_GSS, and RPCSEC_GSS never produces GSS_wrap() tokens without
confidentiality. confidentiality.
Effectively there is a single GSS context for a single client ID. There is one SSV per client ID. Effectively there is a single GSS
All RPCSEC_GSS handles share the same GSS context. SSV GSS contexts context for a client ID / SSV pair. All SSV mechanism RPCSEC_GSS
do not expire except when the SSV is destroyed (causes would include handles of a client ID / SSV pair share the same GSS context. SSV
the client ID being destroyed or a server restart). Since one GSS contexts do not expire except when the SSV is destroyed (causes
purpose of context expiration is to replace keys that have been in would include the client ID being destroyed or a server restart).
use for "too long" hence vulnerable to compromise by brute force or Since one purpose of context expiration is to replace keys that have
accident, the client can send periodic SET_SSV operations, by cycling been in use for "too long" hence vulnerable to compromise by brute
through different users' RPCSEC_GSS credentials. This way the SSV is force or accident, the client can replace the SSV key by sending
replaced without destroying the SSV's GSS contexts. periodic SET_SSV operations, by cycling through different users'
RPCSEC_GSS credentials. This way the SSV is replaced without
destroying the SSV's GSS contexts.
SSV RPCSEC_GSS handles can be expired or deleted by the server at any SSV RPCSEC_GSS handles can be expired or deleted by the server at any
time and the EXCHANGE_ID operation can be used to create more SSV time and the EXCHANGE_ID operation can be used to create more SSV
RPCSEC_GSS handles. RPCSEC_GSS handles. Expiration of SSV RPCSEC_GSS handles does not
imply that the SSV or its GSS context have expired.
The client MUST establish an SSV via SET_SSV before the SSV GSS The client MUST establish an SSV via SET_SSV before the SSV GSS
context can be used to emit tokens from GSS_Wrap() and GSS_GetMIC(). context can be used to emit tokens from GSS_Wrap() and GSS_GetMIC().
If SET_SSV has not been successfully called, attempts to emit tokens If SET_SSV has not been successfully called, attempts to emit tokens
MUST fail. MUST fail.
The SSV mechanism does not support replay detection and sequencing in The SSV mechanism does not support replay detection and sequencing in
its tokens because RPCSEC_GSS does not use those features (See its tokens because RPCSEC_GSS does not use those features (See
Section 5.2.2 "Context Creation Requests" in [4]). Section 5.2.2 "Context Creation Requests" in [4]).
skipping to change at page 70, line 32 skipping to change at page 71, line 32
The client SHOULD honor the following obligations in order to utilize The client SHOULD honor the following obligations in order to utilize
the session: the session:
o Keep a necessary session from going idle on the server. A client o Keep a necessary session from going idle on the server. A client
that requires a session, but nonetheless is not sending operations that requires a session, but nonetheless is not sending operations
risks having the session be destroyed by the server. This is risks having the session be destroyed by the server. This is
because sessions consume resources, and resource limitations may because sessions consume resources, and resource limitations may
force the server to cull an inactive session. force the server to cull an inactive session.
o Destroy the session when not needed. If a client has multiple o Destroy the session when not needed. If a client has multiple
sessions and one of them has no requests waiting for replies, and sessions, one of which has no requests waiting for replies, and
has been idle for some period of time, it SHOULD destroy the has been idle for some period of time, it SHOULD destroy the
session. session.
o Maintain GSS contexts for the backchannel. If the client requires o Maintain GSS contexts for the backchannel. If the client requires
the server to use the RPCSEC_GSS security flavor for callbacks, the server to use the RPCSEC_GSS security flavor for callbacks,
then it needs to be sure the contexts handed to the server via then it needs to be sure the contexts handed to the server via
BACKCHANNEL_CTL are unexpired. BACKCHANNEL_CTL are unexpired.
o Preserve a connection for a backchannel. The server requires a o Preserve a connection for a backchannel. The server requires a
backchannel in order to gracefully recall recallable state, or backchannel in order to gracefully recall recallable state, or
notify the client of certain events. Note that if the connection notify the client of certain events. Note that if the connection
is not being used for the fore channel, there is no way the client is not being used for the fore channel, there is no way for the
tell if the connection is still alive (e.g., the server restarted client tell if the connection is still alive (e.g., the server
without sending a disconnect). The onus is on the server, not the restarted without sending a disconnect). The onus is on the
client, to determine if the backchannel's connection is alive, and server, not the client, to determine if the backchannel's
to indicate in the response to a SEQUENCE operation when the last connection is alive, and to indicate in the response to a SEQUENCE
connection associated with a session's backchannel has operation when the last connection associated with a session's
disconnected. backchannel has disconnected.
2.10.9.3. Steps the Client Takes To Establish a Session 2.10.9.3. Steps the Client Takes To Establish a Session
If the client does not have a client ID, the client sends EXCHANGE_ID If the client does not have a client ID, the client sends EXCHANGE_ID
to establish a client ID. If it opts for SP4_MACH_CRED or SP4_SSV to establish a client ID. If it opts for SP4_MACH_CRED or SP4_SSV
protection, in the spo_must_enforce list of operations, it SHOULD at protection, in the spo_must_enforce list of operations, it SHOULD at
minimum specify: CREATE_SESSION, DESTROY_SESSION, minimum specify: CREATE_SESSION, DESTROY_SESSION,
BIND_CONN_TO_SESSION, BACKCHANNEL_CTL, and DESTROY_CLIENTID. If opts BIND_CONN_TO_SESSION, BACKCHANNEL_CTL, and DESTROY_CLIENTID. If opts
for SP4_SSV protection, the client needs to ask for SSV-based for SP4_SSV protection, the client needs to ask for SSV-based
RPCSEC_GSS handles. RPCSEC_GSS handles.
skipping to change at page 72, line 18 skipping to change at page 73, line 18
callback use have expired, the client MUST establish a new context callback use have expired, the client MUST establish a new context
via BACKCHANNEL_CTL. The sr_status_flags field of the SEQUENCE via BACKCHANNEL_CTL. The sr_status_flags field of the SEQUENCE
results indicates when callback contexts are nearly expired, or fully results indicates when callback contexts are nearly expired, or fully
expired (see Section 18.46.3). expired (see Section 18.46.3).
2.10.10.1.2. Connection Loss 2.10.10.1.2. Connection Loss
If the client loses the last connection of the session, and if wants If the client loses the last connection of the session, and if wants
to retain the session, then it must create a new connection, and if, to retain the session, then it must create a new connection, and if,
when the client ID was created, BIND_CONN_TO_SESSION was specified in when the client ID was created, BIND_CONN_TO_SESSION was specified in
the spo_must_enforce list, the client MUST use BIND_CONNN_TO_SESSION the spo_must_enforce list, the client MUST use BIND_CONN_TO_SESSION
to associate the connection with the session. to associate the connection with the session.
If there was a request outstanding at the time the of connection If there was a request outstanding at the time the of connection
loss, then if client wants to continue to use the session it MUST loss, then if client wants to continue to use the session it MUST
retry the request, as described in Section 2.10.5.2. Note that it is retry the request, as described in Section 2.10.5.2. Note that it is
not necessary to retry requests over a connection with the same not necessary to retry requests over a connection with the same
source network address or the same destination network address as the source network address or the same destination network address as the
lost connection. As long as the sessionid, slot id, and sequence id lost connection. As long as the sessionid, slot id, and sequence id
in the retry match that of the original request, the server will in the retry match that of the original request, the server will
recognize the request as a retry if it executed the request prior to recognize the request as a retry if it executed the request prior to
skipping to change at page 75, line 8 skipping to change at page 76, line 8
sr_status_flags field of every SEQUENCE reply until the backchannel sr_status_flags field of every SEQUENCE reply until the backchannel
is reestablished. There are two situations each of which use is reestablished. There are two situations each of which use
different status flags: no connectivity for the session's different status flags: no connectivity for the session's
backchannel, and no connectivity for any session backchannel of the backchannel, and no connectivity for any session backchannel of the
client. See Section 18.46 for a description of the appropriate flags client. See Section 18.46 for a description of the appropriate flags
in sr_status_flags. in sr_status_flags.
2.10.10.2.5. GSS Context Loss 2.10.10.2.5. GSS Context Loss
The server SHOULD monitor when the number RPCSEC_GSS contexts The server SHOULD monitor when the number RPCSEC_GSS contexts
assigned to the backchannel reaches one, and that one context is near assigned to the backchannel reaches one, and when that one context is
expiry (i.e. between one and two periods of lease time), and indicate near expiry (i.e. between one and two periods of lease time),
so in the sr_status_flags field of all SEQUENCE replies. The server indicate so in the sr_status_flags field of all SEQUENCE replies.
MUST indicate when the all of the backchannel's assigned RPCSEC_GSS The server MUST indicate when the all of the backchannel's assigned
contexts have expired in the sr_status_flags field of all SEQUENCE RPCSEC_GSS contexts have expired in the sr_status_flags field of all
replies. SEQUENCE replies.
2.10.11. Parallel NFS and Sessions 2.10.11. Parallel NFS and Sessions
A client and server can potentially be a non-pNFS implementation, a A client and server can potentially be a non-pNFS implementation, a
metadata server implementation, a data server implementation, or two metadata server implementation, a data server implementation, or two
or three types of implementations. The EXCHGID4_FLAG_USE_NON_PNFS, or three types of implementations. The EXCHGID4_FLAG_USE_NON_PNFS,
EXCHGID4_FLAG_USE_PNFS_MDS, and EXCHGID4_FLAG_USE_PNFS_DS flags (not EXCHGID4_FLAG_USE_PNFS_MDS, and EXCHGID4_FLAG_USE_PNFS_DS flags (not
mutually exclusive) are passed in the EXCHANGE_ID arguments and mutually exclusive) are passed in the EXCHANGE_ID arguments and
results to allow the client to indicate how it wants to use sessions results to allow the client to indicate how it wants to use sessions
created under the client ID, and to allow the server to indicate how created under the client ID, and to allow the server to indicate how
skipping to change at page 76, line 32 skipping to change at page 77, line 32
integer. integer.
o NFS4_MAXFILELEN is the maximum length of a regular file. o NFS4_MAXFILELEN is the maximum length of a regular file.
o NFS4_MAXFILEOFF is the maximum offset into a regular file. o NFS4_MAXFILEOFF is the maximum offset into a regular file.
3.2. Basic Data Types 3.2. Basic Data Types
These are the base NFSv4.1 data types. These are the base NFSv4.1 data types.
+----------------------+--------------------------------------------+ +---------------+---------------------------------------------------+
| Data Type | Definition | | Data Type | Definition |
+----------------------+--------------------------------------------+ +---------------+---------------------------------------------------+
| int32_t | typedef int int32_t; | | int32_t | typedef int int32_t; |
| uint32_t | typedef unsigned int uint32_t; | | uint32_t | typedef unsigned int uint32_t; |
| int64_t | typedef hyper int64_t; | | int64_t | typedef hyper int64_t; |
| uint64_t | typedef unsigned hyper uint64_t; | | uint64_t | typedef unsigned hyper uint64_t; |
| attrlist4<> | typedef opaque attrlist4<>; | | attrlist4 | typedef opaque attrlist4<>; |
| | Used for file/directory attributes | | | Used for file/directory attributes. |
| bitmap4<> | typedef uint32_t bitmap4<>; | | bitmap4 | typedef uint32_t bitmap4<>; |
| | Used in attribute array encoding. | | | Used in attribute array encoding. |
| changeid4 | typedef uint64_t changeid4; | | changeid4 | typedef uint64_t changeid4; |
| | Used in definition of change_info | | | Used in the definition of change_info4. |
| clientid4 | typedef uint64_t clientid4; | | clientid4 | typedef uint64_t clientid4; |
| | Shorthand reference to client | | | Shorthand reference to client identification. |
| | identification |
| count4 | typedef uint32_t count4; | | count4 | typedef uint32_t count4; |
| | Various count parameters (READ, WRITE, | | | Various count parameters (READ, WRITE, COMMIT). |
| | COMMIT) |
| length4 | typedef uint64_t length4; | | length4 | typedef uint64_t length4; |
| | Describes LOCK lengths | | | Describes LOCK lengths. |
| mode4 | typedef uint32_t mode4; | | mode4 | typedef uint32_t mode4; |
| | Mode attribute data type | | | Mode attribute data type. |
| nfs_cookie4 | typedef uint64_t nfs_cookie4; | | nfs_cookie4 | typedef uint64_t nfs_cookie4; |
| | Opaque cookie value for READDIR | | | Opaque cookie value for READDIR. |
| nfs_fh4<NFS4_FHSIZE> | typedef opaque nfs_fh4<NFS4_FHSIZE>; | | nfs_fh4 | typedef opaque nfs_fh4<NFS4_FHSIZE>; |
| | Filehandle definition | | | Filehandle definition. |
| nfs_ftype4 | enum nfs_ftype4; | | nfs_ftype4 | enum nfs_ftype4; |
| | Various defined file types | | | Various defined file types. |
| nfsstat4 | enum nfsstat4; | | nfsstat4 | enum nfsstat4; |
| | Return value for operations | | | Return value for operations. |
| offset4 | typedef uint64_t offset4; | | offset4 | typedef uint64_t offset4; |
| | Various offset designations (READ, WRITE, | | | Various offset designations (READ, WRITE, LOCK, |
| | LOCK, COMMIT) | | | COMMIT). |
| qop4 | typedef uint32_t qop4; | | qop4 | typedef uint32_t qop4; |
| | Quality of protection designation in | | | Quality of protection designation in SECINFO. |
| | SECINFO | | sec_oid4 | typedef opaque sec_oid4<>; |
| sec_oid4<> | typedef opaque sec_oid4<>; | | | Security Object Identifier. The sec_oid4 data |
| | Security Object Identifier The sec_oid4 | | | type is not really opaque. Instead it contains |
| | data type is not really opaque. Instead | | | an ASN.1 OBJECT IDENTIFIER as used by GSS-API in |
| | it contains an ASN.1 OBJECT IDENTIFIER as | | | the mech_type argument to GSS_Init_sec_context. |
| | used by GSS-API in the mech_type argument | | | See [7] for details. |
| | to GSS_Init_sec_context. See [7] for |
| | details. |
| sequenceid4 | typedef uint32_t sequenceid4; | | sequenceid4 | typedef uint32_t sequenceid4; |
| | sequence number used for various session | | | Sequence number used for various session |
| | operations (EXCHANGE_ID, CREATE_SESSION, | | | operations (EXCHANGE_ID, CREATE_SESSION, |
| | SEQUENCE, CB_SEQUENCE). | | | SEQUENCE, CB_SEQUENCE). |
| seqid4 | typedef uint32_t seqid4; | | seqid4 | typedef uint32_t seqid4; |
| | Sequence identifier used for file locking | | | Sequence identifier used for file locking. |
| sessionid4 | typedef opaque | | sessionid4 | typedef opaque sessionid4[NFS4_SESSIONID_SIZE]; |
| | sessionid4[NFS4_SESSIONID_SIZE]; | | | Session identifier. |
| | Session identifier |
| slotid4 | typedef uint32_t slotid4; | | slotid4 | typedef uint32_t slotid4; |
| | sequencing artifact for various session | | | Sequencing artifact for various session |
| | operations (SEQUENCE, CB_SEQUENCE). | | | operations (SEQUENCE, CB_SEQUENCE). |
| utf8string<> | typedef opaque utf8string<>; | | utf8string | typedef opaque utf8string<>; |
| | UTF-8 encoding for strings | | | UTF-8 encoding for strings. |
| utf8str_cis | typedef utf8string utf8str_cis; | | utf8str_cis | typedef utf8string utf8str_cis; |
| | Case-insensitive UTF-8 string | | | Case-insensitive UTF-8 string. |
| utf8str_cs | typedef utf8string utf8str_cs; | | utf8str_cs | typedef utf8string utf8str_cs; |
| | Case-sensitive UTF-8 string | | | Case-sensitive UTF-8 string. |
| utf8str_mixed | typedef utf8string utf8str_mixed; | | utf8str_mixed | typedef utf8string utf8str_mixed; |
| | UTF-8 strings with a case sensitive prefix | | | UTF-8 strings with a case sensitive prefix and a |
| | and a case insensitive suffix. | | | case insensitive suffix. |
| component4 | typedef utf8str_cs component4; | | component4 | typedef utf8str_cs component4; |
| | Represents path name components | | | Represents path name components. |
| linktext4 | typedef utf8str_cs linktext4; | | linktext4 | typedef utf8str_cs linktext4; |
| | Symbolic link contents | | | Symbolic link contents. |
| pathname4<> | typedef component4 pathname4<>; | | pathname4 | typedef component4 pathname4<>; |
| | Represents path name for fs_locations | | | Represents path name for fs_locations. |
| verifier4 | typedef opaque | | verifier4 | typedef opaque verifier4[NFS4_VERIFIER_SIZE]; |
| | verifier4[NFS4_VERIFIER_SIZE]; | | | Verifier used for various operations (COMMIT, |
| | Verifier used for various operations | | | CREATE, EXCHANGE_ID, OPEN, READDIR, WRITE) |
| | (COMMIT, CREATE, EXCHANGE_ID, OPEN, | | | NFS4_VERIFIER_SIZE is defined as 8. |
| | READDIR, WRITE) NFS4_VERIFIER_SIZE is | +---------------+---------------------------------------------------+
| | defined as 8. |
+----------------------+--------------------------------------------+
End of Base Data Types End of Base Data Types
Table 1 Table 1
3.3. Structured Data Types 3.3. Structured Data Types
3.3.1. nfstime4 3.3.1. nfstime4
struct nfstime4 { struct nfstime4 {
int64_t seconds; int64_t seconds;
uint32_t nseconds; uint32_t nseconds;
}; };
The nfstime4 structure gives the number of seconds and nanoseconds The nfstime4 data type gives the number of seconds and nanoseconds
since midnight or 0 hour January 1, 1970 Coordinated Universal Time since midnight or 0 hour January 1, 1970 Coordinated Universal Time
(UTC). Values greater than zero for the seconds field denote dates (UTC). Values greater than zero for the seconds field denote dates
after the 0 hour January 1, 1970. Values less than zero for the after the 0 hour January 1, 1970. Values less than zero for the
seconds field denote dates before the 0 hour January 1, 1970. In seconds field denote dates before the 0 hour January 1, 1970. In
both cases, the nseconds field is to be added to the seconds field both cases, the nseconds field is to be added to the seconds field
for the final time representation. For example, if the time to be for the final time representation. For example, if the time to be
represented is one-half second before 0 hour January 1, 1970, the represented is one-half second before 0 hour January 1, 1970, the
seconds field would have a value of negative one (-1) and the seconds field would have a value of negative one (-1) and the
nseconds fields would have a value of one-half second (500000000). nseconds fields would have a value of one-half second (500000000).
Values greater than 999,999,999 for nseconds are considered invalid. Values greater than 999,999,999 for nseconds are invalid.
This data type is used to pass time and date information. A server This data type is used to pass time and date information. A server
converts to and from its local representation of time when processing converts to and from its local representation of time when processing
time values, preserving as much accuracy as possible. If the time values, preserving as much accuracy as possible. If the
precision of timestamps stored for a file system object is less than precision of timestamps stored for a file system object is less than
defined, loss of precision can occur. An adjunct time maintenance defined, loss of precision can occur. An adjunct time maintenance
protocol is RECOMMENDED to reduce client and server time skew. protocol is RECOMMENDED to reduce client and server time skew.
3.3.2. time_how4 3.3.2. time_how4
skipping to change at page 79, line 14 skipping to change at page 80, line 14
3.3.3. settime4 3.3.3. settime4
union settime4 switch (time_how4 set_it) { union settime4 switch (time_how4 set_it) {
case SET_TO_CLIENT_TIME4: case SET_TO_CLIENT_TIME4:
nfstime4 time; nfstime4 time;
default: default:
void; void;
}; };
The above definitions are used as the attribute definitions to set The time_how4 and settime4 data types are used for setting timestamps
time values. If set_it is SET_TO_SERVER_TIME4, then the server uses in file object attributes. If set_it is SET_TO_SERVER_TIME4, then
its local representation of time for the time value. the server uses its local representation of time for the time value.
3.3.4. specdata4 3.3.4. specdata4
struct specdata4 { struct specdata4 {
uint32_t specdata1; /* major device number */ uint32_t specdata1; /* major device number */
uint32_t specdata2; /* minor device number */ uint32_t specdata2; /* minor device number */
}; };
This data type represents additional information for the device file This data type represents the device numbers for the device file
types NF4CHR and NF4BLK. types NF4CHR and NF4BLK.
3.3.5. fsid4 3.3.5. fsid4
struct fsid4 { struct fsid4 {
uint64_t major; uint64_t major;
uint64_t minor; uint64_t minor;
}; };
3.3.6. chg_policy4 3.3.6. chg_policy4
skipping to change at page 80, line 12 skipping to change at page 81, line 12
id and the second to be incremented non-persistently, within a given id and the second to be incremented non-persistently, within a given
server instance. server instance.
3.3.7. fattr4 3.3.7. fattr4
struct fattr4 { struct fattr4 {
bitmap4 attrmask; bitmap4 attrmask;
attrlist4 attr_vals; attrlist4 attr_vals;
}; };
The fattr4 structure is used to represent file and directory The fattr4 data type is used to represent file and directory
attributes. attributes.
The bitmap is a counted array of 32 bit integers used to contain bit The bitmap is a counted array of 32 bit integers used to contain bit
values. The position of the integer in the array that contains bit n values. The position of the integer in the array that contains bit n
can be computed from the expression (n / 32) and its bit within that can be computed from the expression (n / 32) and its bit within that
integer is (n mod 32). integer is (n mod 32).
0 1 0 1
+-----------+-----------+-----------+-- +-----------+-----------+-----------+--
| count | 31 .. 0 | 63 .. 32 | | count | 31 .. 0 | 63 .. 32 |
+-----------+-----------+-----------+-- +-----------+-----------+-----------+--
3.3.8. change_info4 3.3.8. change_info4
struct change_info4 { struct change_info4 {
bool atomic; bool atomic;
changeid4 before; changeid4 before;
changeid4 after; changeid4 after;
}; };
This structure is used with the CREATE, LINK, REMOVE, RENAME This data type is used with the CREATE, LINK, OPEN, REMOVE, and
operations to let the client know the value of the change attribute RENAME operations to let the client know the value of the change
for the directory in which the target file system object resides. attribute for the directory in which the target file system object
resides.
3.3.9. netaddr4 3.3.9. netaddr4
struct netaddr4 { struct netaddr4 {
/* see struct rpcb in RFC 1833 */ /* see struct rpcb in RFC 1833 */
string na_r_netid<>; /* network id */ string na_r_netid<>; /* network id */
string na_r_addr<>; /* universal address */ string na_r_addr<>; /* universal address */
}; };
The netaddr4 structure is used to identify TCP/IP based endpoints. The netaddr4 data type is used to identify TCP/IP based endpoints.
The r_netid and r_addr fields are specified in RFC1833 [26], but they The r_netid and r_addr fields are specified in RFC1833 [26], but they
are underspecified in RFC1833 [26] as far as what they should look are underspecified in RFC1833 [26] as far as what they should look
like for specific protocols. like for specific protocols. The next section clarifies this.
3.3.9.1. Format of netaddr4 for TCP and UDP over IPv4
For TCP over IPv4 and for UDP over IPv4, the format of r_addr is the For TCP over IPv4 and for UDP over IPv4, the format of r_addr is the
US-ASCII string: US-ASCII string:
h1.h2.h3.h4.p1.p2 h1.h2.h3.h4.p1.p2
The prefix, "h1.h2.h3.h4", is the standard textual form for The prefix, "h1.h2.h3.h4", is the standard textual form for
representing an IPv4 address, which is always four bytes long. representing an IPv4 address, which is always four bytes long.
Assuming big-endian ordering, h1, h2, h3, and h4, are respectively, Assuming big-endian ordering, h1, h2, h3, and h4, are respectively,
the first through fourth bytes each converted to ASCII-decimal. the first through fourth bytes each converted to ASCII-decimal. The
Assuming big-endian ordering, p1 and p2 are, respectively, the first suffix, "p1.p2", is a textual form for representing a TCP and UDP
and second bytes each converted to ASCII-decimal. For example, if a service port. Assuming big-endian ordering, p1 and p2 are,
host, in big-endian order, has an address of 0x0A010307 and there is respectively, the first and second bytes each converted to ASCII-
a service listening on, in big endian order, port 0x020F (decimal decimal. For example, if a host, in big-endian order, has an address
527), then complete universal address is "10.1.3.7.2.15". of 0x0A010307 and there is a service listening on, in big endian
order, port 0x020F (decimal 527), then the complete universal address
is "10.1.3.7.2.15".
For TCP over IPv4 the value of r_netid is the string "tcp". For UDP For TCP over IPv4 the value of r_netid is the string "tcp". For UDP
over IPv4 the value of r_netid is the string "udp". That this over IPv4 the value of r_netid is the string "udp". That this
document specifies the universal address and netid for UDP/IPv6 does document specifies the universal address and netid for UDP/IPv6 does
not imply that UDP/IPv4 is a legal transport for NFSv4.1 (see not imply that UDP/IPv4 is a legal transport for NFSv4.1 (see
Section 2.9). Section 2.9).
3.3.9.2. Format of netaddr4 for TCP and UDP over IPv6
For TCP over IPv6 and for UDP over IPv6, the format of r_addr is the For TCP over IPv6 and for UDP over IPv6, the format of r_addr is the
US-ASCII string: US-ASCII string:
x1:x2:x3:x4:x5:x6:x7:x8.p1.p2 x1:x2:x3:x4:x5:x6:x7:x8.p1.p2
The suffix "p1.p2" is the service port, and is computed the same way The suffix "p1.p2" is the service port, and is computed the same way
as with universal addresses for TCP and UDP over IPv4. The prefix, as with universal addresses for TCP and UDP over IPv4. The prefix,
"x1:x2:x3:x4:x5:x6:x7:x8", is the standard textual form for "x1:x2:x3:x4:x5:x6:x7:x8", is the preferred textual form for
representing an IPv6 address as defined in Section 2.2 of RFC2373 representing an IPv6 address as defined in Section 2.2 of RFC3513
[13]. Additionally, the two alternative forms specified in Section [13]. Additionally, the two alternative forms specified in Section
2.2 of RFC2373 [13] are also acceptable. 2.2 of RFC3513 are also acceptable.
For TCP over IPv6 the value of r_netid is the string "tcp6". For UDP For TCP over IPv6 the value of r_netid is the string "tcp6". For UDP
over IPv6 the value of r_netid is the string "udp6". That this over IPv6 the value of r_netid is the string "udp6". That this
document specifies the universal address and netid for UDP/IPv6 does document specifies the universal address and netid for UDP/IPv6 does
not imply that UDP/IPv6 is a legal transport for NFSv4.1 (see not imply that UDP/IPv6 is a legal transport for NFSv4.1 (see
Section 2.9). Section 2.9).
3.3.10. state_owner4 3.3.10. state_owner4
struct state_owner4 { struct state_owner4 {
skipping to change at page 82, line 7 skipping to change at page 83, line 20
}; };
typedef state_owner4 open_owner4; typedef state_owner4 open_owner4;
typedef state_owner4 lock_owner4; typedef state_owner4 lock_owner4;
The state_owner4 data type is the base type for the open_owner4 The state_owner4 data type is the base type for the open_owner4
Section 3.3.10.1 and lock_owner4 Section 3.3.10.2. Section 3.3.10.1 and lock_owner4 Section 3.3.10.2.
3.3.10.1. open_owner4 3.3.10.1. open_owner4
This structure is used to identify the owner of open state. This data type is used to identify the owner of open state.
3.3.10.2. lock_owner4 3.3.10.2. lock_owner4
This structure is used to identify the owner of file locking state. This structure is used to identify the owner of byte-range locking
state.
3.3.11. open_to_lock_owner4 3.3.11. open_to_lock_owner4
struct open_to_lock_owner4 { struct open_to_lock_owner4 {
seqid4 open_seqid; seqid4 open_seqid;
stateid4 open_stateid; stateid4 open_stateid;
seqid4 lock_seqid; seqid4 lock_seqid;
lock_owner4 lock_owner; lock_owner4 lock_owner;
}; };
This structure is used for the first LOCK operation done for an This data type is used for the first LOCK operation done for an
open_owner4. It provides both the open_stateid and lock_owner such open_owner4. It provides both the open_stateid and lock_owner such
that the transition is made from a valid open_stateid sequence to that the transition is made from a valid open_stateid sequence to
that of the new lock_stateid sequence. Using this mechanism avoids that of the new lock_stateid sequence. Using this mechanism avoids
the confirmation of the lock_owner/lock_seqid pair since it is tied the confirmation of the lock_owner/lock_seqid pair since it is tied
to established state in the form of the open_stateid/open_seqid. to established state in the form of the open_stateid/open_seqid.
3.3.12. stateid4 3.3.12. stateid4
struct stateid4 { struct stateid4 {
uint32_t seqid; uint32_t seqid;
opaque other[12]; opaque other[12];
}; };
This structure is used for the various state sharing mechanisms This data type is used for the various state sharing mechanisms
between the client and server. For the client, this data structure between the client and server. The client never modifies a value of
is read-only. The starting value of the seqid field is undefined. data type stateid. The starting value of the seqid field is
The server is required to increment the seqid field monotonically at undefined. The server is required to increment the seqid field by
each transition of the stateid. This is important since the client one (1) at each transition of the stateid. This is important since
will inspect the seqid in OPEN stateids to determine the order of the client will inspect the seqid in OPEN stateids to determine the
OPEN processing done by the server. order of OPEN processing done by the server.
3.3.13. layouttype4 3.3.13. layouttype4
enum layouttype4 { enum layouttype4 {
LAYOUT4_NFSV4_1_FILES = 1, LAYOUT4_NFSV4_1_FILES = 0x1,
LAYOUT4_OSD2_OBJECTS = 2, LAYOUT4_OSD2_OBJECTS = 0x2,
LAYOUT4_BLOCK_VOLUME = 3 LAYOUT4_BLOCK_VOLUME = 0x3
}; };
This data type indicates what type of layout is being used. The file This data type indicates what type of layout is being used. The file
server advertises the layout types it supports through the server advertises the layout types it supports through the
fs_layout_type file system attribute (Section 5.11.1). A client asks fs_layout_type file system attribute (Section 5.11.1). A client asks
for layouts of a particular type in LAYOUTGET, and processes those for layouts of a particular type in LAYOUTGET, and processes those
layouts in its layout-type-specific logic. layouts in its layout-type-specific logic.
The layouttype4 structure is 32 bits in length. The range The layouttype4 data type is 32 bits in length. The range
represented by the layout type is split into three parts. Type 0x0 represented by the layout type is split into three parts. Type 0x0
is reserved. Types within the range 0x00000001-0x7FFFFFFF are is reserved. Types within the range 0x00000001-0x7FFFFFFF are
globally unique and are assigned according to the description in globally unique and are assigned according to the description in
Section 22.4; they are maintained by IANA. Types within the range Section 22.4; they are maintained by IANA. Types within the range
0x80000000-0xFFFFFFFF are site specific and for "private use" only. 0x80000000-0xFFFFFFFF are site specific and for private use only.
The LAYOUT4_NFSV4_1_FILES enumeration specifies that the NFSv4.1 file The LAYOUT4_NFSV4_1_FILES enumeration specifies that the NFSv4.1 file
layout type is to be used. The LAYOUT4_OSD2_OBJECTS enumeration layout type, as defined in Section 13, is to be used. The
specifies that the object layout, as defined in [30], is to be used. LAYOUT4_OSD2_OBJECTS enumeration specifies that the object layout, as
Similarly, the LAYOUT4_BLOCK_VOLUME enumeration that the block/volume defined in [30], is to be used. Similarly, the LAYOUT4_BLOCK_VOLUME
layout, as defined in [31], is to be used. enumeration specifies that the block/volume layout, as defined in
[31], is to be used.
3.3.14. deviceid4 3.3.14. deviceid4
const NFS4_DEVICEID4_SIZE = 16; const NFS4_DEVICEID4_SIZE = 16;
typedef opaque deviceid4[NFS4_DEVICEID4_SIZE]; typedef opaque deviceid4[NFS4_DEVICEID4_SIZE];
Layout information includes device IDs that specify a storage device Layout information includes device IDs that specify a storage device
through a compact handle. Addressing and type information is through a compact handle. Addressing and type information is
obtained with the GETDEVICEINFO operation. A client must not assume obtained with the GETDEVICEINFO operation. Device IDs are not
that device IDs are valid across metadata server reboots. The device guaranteed to be valid across metadata server reboots. A device ID
ID is qualified by the layout type and are unique per file system is unique per client ID and layout type. See Section 12.2.10 for
(FSID). See Section 12.2.10 for more details. more details.
3.3.15. device_addr4 3.3.15. device_addr4
struct device_addr4 { struct device_addr4 {
layouttype4 da_layout_type; layouttype4 da_layout_type;
opaque da_addr_body<>; opaque da_addr_body<>;
}; };
The device address is used to set up a communication channel with the The device address is used to set up a communication channel with the
storage device. Different layout types will require different types storage device. Different layout types will require different data
of structures to define how they communicate with storage devices. types to define how they communicate with storage devices. The
The opaque da_addr_body field must be interpreted based on the opaque da_addr_body field must be interpreted based on the specified
specified da_layout_type field. da_layout_type field.
This document defines the device address for the NFSv4.1 file layout This document defines the device address for the NFSv4.1 file layout
(see Section 13.3), which identifies a storage device by network IP (see Section 13.3), which identifies a storage device by network IP
address and port number. This is sufficient for the clients to address and port number. This is sufficient for the clients to
communicate with the NFSv4.1 storage devices, and may be sufficient communicate with the NFSv4.1 storage devices, and may be sufficient
for other layout types as well. Device types for object storage for other layout types as well. Device types for object storage
devices and block storage devices (e.g., SCSI volume labels) will be devices and block storage devices (e.g., SCSI volume labels) will be
defined by their respective layout specifications. defined by their respective layout specifications.
3.3.16. layout_content4 3.3.16. layout_content4
skipping to change at page 84, line 28 skipping to change at page 85, line 46
3.3.17. layout4 3.3.17. layout4
struct layout4 { struct layout4 {
offset4 lo_offset; offset4 lo_offset;
length4 lo_length; length4 lo_length;
layoutiomode4 lo_iomode; layoutiomode4 lo_iomode;
layout_content4 lo_content; layout_content4 lo_content;
}; };
The layout4 structure defines a layout for a file. The layout type The layout4 data type defines a layout for a file. The layout type
specific data is opaque within lo_content. Since layouts are sub- specific data is opaque within lo_content. Since layouts are sub-
dividable, the offset and length together with the file's filehandle, dividable, the offset and length together with the file's filehandle,
the client ID, iomode, and layout type, identify the layout. the client ID, iomode, and layout type, identify the layout.
3.3.18. layoutupdate4 3.3.18. layoutupdate4
struct layoutupdate4 { struct layoutupdate4 {
layouttype4 lou_type; layouttype4 lou_type;
opaque lou_body<>; opaque lou_body<>;
}; };
The layoutupdate4 structure is used by the client to return 'updated' The layoutupdate4 data type is used by the client to return updated
layout information to the metadata server at LAYOUTCOMMIT time. This layout information to the metadata server via the LAYOUTCOMMIT
structure provides a channel to pass layout type specific information (Section 18.42) operation. This data type provides a channel to pass
(in field lou_body) back to the metadata server. E.g., for block/ layout type specific information (in field lou_body) back to the
volume layout types this could include the list of reserved blocks metadata server. E.g., for the block/volume layout type this could
that were written. The contents of the opaque lou_body argument are include the list of reserved blocks that were written. The contents
determined by the layout type and are defined in their context. The of the opaque lou_body argument are determined by the layout type.
NFSv4.1 file-based layout does not use this structure, thus the The NFSv4.1 file-based layout does not use this data type; if
lou_body field should have a zero length. lou_type is LAYOUT4_NFSV4_1_FILES, the lou_body field MUST have a
zero length.
3.3.19. layouthint4 3.3.19. layouthint4
struct layouthint4 { struct layouthint4 {
layouttype4 loh_type; layouttype4 loh_type;
opaque loh_body<>; opaque loh_body<>;
}; };
The layouthint4 structure is used by the client to pass in a hint The layouthint4 data type is used by the client to pass in a hint
about the type of layout it would like created for a particular file. about the type of layout it would like created for a particular file.
It is the structure specified by the layout_hint attribute described It is the data type specified by the layout_hint attribute described
in Section 5.11.4. The metadata server may ignore the hint, or may in Section 5.11.4. The metadata server may ignore the hint, or may
selectively ignore fields within the hint. This hint should be selectively ignore fields within the hint. This hint should be
provided at create time as part of the initial attributes within provided at create time as part of the initial attributes within
OPEN. The loh_body field is specific to the type of layout OPEN. The loh_body field is specific to the type of layout
(loh_type). The NFSv4.1 file-based layout uses the (loh_type). The NFSv4.1 file-based layout uses the
nfsv4_1_file_layouthint4 structure as defined in Section 13.3. nfsv4_1_file_layouthint4 data type as defined in Section 13.3.
3.3.20. layoutiomode4 3.3.20. layoutiomode4
enum layoutiomode4 { enum layoutiomode4 {
LAYOUTIOMODE4_READ = 1, LAYOUTIOMODE4_READ = 1,
LAYOUTIOMODE4_RW = 2, LAYOUTIOMODE4_RW = 2,
LAYOUTIOMODE4_ANY = 3 LAYOUTIOMODE4_ANY = 3
}; };
The iomode specifies whether the client intends to read or write The iomode specifies whether the client intends to just read or both
(with the possibility of reading) the data represented by the layout. read and write the data represented by the layout. While the
The ANY iomode MUST NOT be used for LAYOUTGET, however, it can be LAYOUTIOMODE4_ANY iomode MUST NOT be used in the arguments to the
used for LAYOUTRETURN and CB_LAYOUTRECALL. The ANY iomode specifies LAYOUTGET operation, it MAY be used in the arguments to the
that layouts pertaining to both READ and RW iomodes are being LAYOUTRETURN and CB_LAYOUTRECALL operations. The LAYOUTIOMODE4_ANY
returned or recalled, respectively. The metadata server's use of the iomode specifies that layouts pertaining to both LAYOUTIOMODE4_READ
iomode may depend on the layout type being used. The storage devices and LAYOUTIOMODE4_RW iomodes are being returned or recalled,
may validate I/O accesses against the iomode and reject invalid respectively. The metadata server's use of the iomode may depend on
accesses. the layout type being used. The storage devices MAY validate I/O
accesses against the iomode and reject invalid accesses.
3.3.21. nfs_impl_id4 3.3.21. nfs_impl_id4
struct nfs_impl_id4 { struct nfs_impl_id4 {
utf8str_cis nii_domain; utf8str_cis nii_domain;
utf8str_cs nii_name; utf8str_cs nii_name;
nfstime4 nii_date; nfstime4 nii_date;
}; };
This structure is used to identify client and server implementation This data type is used to identify client and server implementation
detail. The nii_domain field is the DNS domain name that the details. The nii_domain field is the DNS domain name that the
implementer is associated with. The nii_name field is the product implementer is associated with. The nii_name field is the product
name of the implementation and is completely free form. It is name of the implementation and is completely free form. It is
RECOMMENDED that the nii_name be used to distinguish machine RECOMMENDED that the nii_name be used to distinguish machine
architecture, machine platforms, revisions, versions, and patch architecture, machine platforms, revisions, versions, and patch
levels. The nii_date field is the timestamp of when the software levels. The nii_date field is the timestamp of when the software
instance was published or built. instance was published or built.
3.3.22. threshold_item4 3.3.22. threshold_item4
struct threshold_item4 { struct threshold_item4 {
layouttype4 thi_layout_type; layouttype4 thi_layout_type;
bitmap4 thi_hintset; bitmap4 thi_hintset;
opaque thi_hintlist<>; opaque thi_hintlist<>;
}; };
This structure contains a list of hints specific to a layout type for This data type contains a list of hints specific to a layout type for
helping the client determine when it should send I/O directly through helping the client determine when it should send I/O directly through
the metadata server vs. the data servers. The hint structure the metadata server versus the storage devices. The data type
consists of the layout type (thi_layout_type), a bitmap (thi_hintset) consists of the layout type (thi_layout_type), a bitmap (thi_hintset)
describing the set of hints supported by the server (they may differ describing the set of hints supported by the server (they may differ
based on the layout type), and a list of hints (thi_hintlist), whose based on the layout type), and a list of hints (thi_hintlist), whose
structure is determined by the hintset bitmap. See the mdsthreshold content is determined by the hintset bitmap. See the mdsthreshold
attribute for more details. attribute for more details.
The thi_hintset field is a bitmap of the following values: The thi_hintset field is a bitmap of the following values:
+-------------------------+---+---------+---------------------------+ +-------------------------+---+---------+---------------------------+
| name | # | Data | Description | | name | # | Data | Description |
| | | Type | | | | | Type | |
+-------------------------+---+---------+---------------------------+ +-------------------------+---+---------+---------------------------+
| threshold4_read_size | 0 | length4 | The file size below which | | threshold4_read_size | 0 | length4 | The file size below which |
| | | | it is RECOMMENDED to read | | | | | it is RECOMMENDED to read |
skipping to change at page 87, line 11 skipping to change at page 88, line 32
| | | | RECOMMENDED to write data | | | | | RECOMMENDED to write data |
| | | | through the MDS | | | | | through the MDS |
+-------------------------+---+---------+---------------------------+ +-------------------------+---+---------+---------------------------+
3.3.23. mdsthreshold4 3.3.23. mdsthreshold4
struct mdsthreshold4 { struct mdsthreshold4 {
threshold_item4 mth_hints<>; threshold_item4 mth_hints<>;
}; };
This structure holds an array of threshold_item4 structures each of This data type holds an array of elements of data type
which is valid for a particular layout type. An array is necessary threshold_item4, each of which is valid for a particular layout type.
since a server can support multiple layout types for a single file. An array is necessary because a server can support multiple layout
types for a single file.
4. Filehandles 4. Filehandles
The filehandle in the NFS protocol is a per server unique identifier The filehandle in the NFS protocol is a per server unique identifier
for a file system object. The contents of the filehandle are opaque for a file system object. The contents of the filehandle are opaque
to the client. Therefore, the server is responsible for translating to the client. Therefore, the server is responsible for translating
the filehandle to an internal representation of the file system the filehandle to an internal representation of the file system
object. object.
4.1. Obtaining the First Filehandle 4.1. Obtaining the First Filehandle
skipping to change at page 95, line 6 skipping to change at page 96, line 35
further perusal and modification of the name space may be done using further perusal and modification of the name space may be done using
operations that work on more typical directories. In particular, operations that work on more typical directories. In particular,
READDIR may be used to get a list of such named attributes and LOOKUP READDIR may be used to get a list of such named attributes and LOOKUP
and OPEN may select a particular attribute. Creation of a new named and OPEN may select a particular attribute. Creation of a new named
attribute may be the result of an OPEN specifying file creation. attribute may be the result of an OPEN specifying file creation.
Once an OPEN is done, named attributes may be examined and changed by Once an OPEN is done, named attributes may be examined and changed by
normal READ and WRITE operations using the filehandles and stateids normal READ and WRITE operations using the filehandles and stateids
returned by OPEN. returned by OPEN.
Named attributes and the named attribute directory may have have Named attributes and the named attribute directory may have their own
their own (non-named) attributes. Each of objects must have all of (non-named) attributes. Each of objects must have all of the
the REQUIRED attributes and may have additional RECOMMENDED REQUIRED attributes and may have additional RECOMMENDED attributes.
attributes. However, the set of attributes for named attributes and However, the set of attributes for named attributes and the named
the named attribute directory need not be as large as, and typically attribute directory need not be as large as, and typically will not
will not be as large as that for other objects in that file system. be as large as that for other objects in that file system.
Named attributes and the named attribute directory may be the target Named attributes and the named attribute directory may be the target
of delegations (in the case of the named attribute directory these of delegations (in the case of the named attribute directory these
will be directory delegations). However, since granting of will be directory delegations). However, since granting of
delegations or not is within the server's discretion, a server need delegations or not is within the server's discretion, a server need
not support delegations on named attributes or the named attribute not support delegations on named attributes or the named attribute
directory. directory.
It is RECOMMENDED that servers support arbitrary named attributes. A It is RECOMMENDED that servers support arbitrary named attributes. A
client should not depend on the ability to store any named attributes client should not depend on the ability to store any named attributes
skipping to change at page 101, line 51 skipping to change at page 103, line 51
5.7.2.4. Attribute 17: case_preserving 5.7.2.4. Attribute 17: case_preserving
True, if filename case on this file system are preserved. True, if filename case on this file system are preserved.
5.7.2.5. Attribute 60: change_policy 5.7.2.5. Attribute 60: change_policy
A value created by the server that the client can use to determine if A value created by the server that the client can use to determine if
some server policy related to the current file system has been some server policy related to the current file system has been
subject to change. If the value remains the same then the client can subject to change. If the value remains the same then the client can
be sure that the values of the attributes related to fs location and be sure that the values of the attributes related to fs location and
the fsstat_type field of the fs_status attribute have not changed. the fss_type field of the fs_status attribute have not changed. On
On the other hand, a change in this value does necessarily imply a the other hand, a change in this value does necessarily imply a
change in policy. It is up to the client to interrogate the server change in policy. It is up to the client to interrogate the server
to determine if some policy relevant to it has changed. See to determine if some policy relevant to it has changed. See
Section 3.3.6 for details. Section 3.3.6 for details.
This attribute MUST change when the value returned by the This attribute MUST change when the value returned by the
fs_locations or fs_locations_info attribute changes, when a file fs_locations or fs_locations_info attribute changes, when a file
system goes from read-only to writable or vice versa, or when the system goes from read-only to writable or vice versa, or when the
allowable set of security flavors for the file system or any part allowable set of security flavors for the file system or any part
thereof is changed. thereof is changed.
skipping to change at page 130, line 39 skipping to change at page 132, line 39
6.2.1.5.1. Discussion of EVERYONE@ 6.2.1.5.1. Discussion of EVERYONE@
It is important to note that "EVERYONE@" is not equivalent to the It is important to note that "EVERYONE@" is not equivalent to the
UNIX "other" entity. This is because, by definition, UNIX "other" UNIX "other" entity. This is because, by definition, UNIX "other"
does not include the owner or owning group of a file. "EVERYONE@" does not include the owner or owning group of a file. "EVERYONE@"
means literally everyone, including the owner or owning group. means literally everyone, including the owner or owning group.
6.2.2. Attribute 58: dacl 6.2.2. Attribute 58: dacl
The dacl, and sacl, attributes are like the acl attribute, but dacl The dacl attribute is like the acl attribute, but dacl allows just
and sacl each allow only certain types of ACEs. The dacl attribute ALLOW and DENY ACEs. The dacl attribute supports automatic
allows just ALLOW and DENY ACEs. The dacl and sacl attributes also inheritance (see Section 6.4.3.2).
support automatic inheritance (see Section 6.4.3.2).
6.2.3. Attribute 59: sacl 6.2.3. Attribute 59: sacl
The sacl, and dacl, attributes are like the acl attribute, but dacl The sacl attribute is like the acl attribute, but sacl allows just
and sacl each allow only certain types of ACEs. The sacl attribute AUDIT and ALARM ACEs. The sacl attribute supports automatic
allows just AUDIT and ALARM ACEs. The dacl and sacl attributes also inheritance (see Section 6.4.3.2).
support automatic inheritance (see Section 6.4.3.2).
6.2.4. Attribute 33: mode 6.2.4. Attribute 33: mode
The NFSv4.1 mode attribute is based on the UNIX mode bits. The The NFSv4.1 mode attribute is based on the UNIX mode bits. The
following bits are defined: following bits are defined:
const MODE4_SUID = 0x800; /* set user id on execution */ const MODE4_SUID = 0x800; /* set user id on execution */
const MODE4_SGID = 0x400; /* set group id on execution */ const MODE4_SGID = 0x400; /* set group id on execution */
const MODE4_SVTX = 0x200; /* save text even after use */ const MODE4_SVTX = 0x200; /* save text even after use */
const MODE4_RUSR = 0x100; /* read permission: owner */ const MODE4_RUSR = 0x100; /* read permission: owner */
skipping to change at page 144, line 47 skipping to change at page 146, line 42
client is sending by directing the client to send it using weak client is sending by directing the client to send it using weak
security mechanisms. security mechanisms.
8. State Management 8. State Management
Integrating locking into the NFS protocol necessarily causes it to be Integrating locking into the NFS protocol necessarily causes it to be
stateful. With the inclusion of such features as share reservations, stateful. With the inclusion of such features as share reservations,
file and directory delegations, recallable layouts, and support for file and directory delegations, recallable layouts, and support for
mandatory record locking the protocol becomes substantially more mandatory record locking the protocol becomes substantially more
dependent on proper management of state than the traditional dependent on proper management of state than the traditional
combination of NFS and NLM [XNFS]. These features include expanded combination of NFS and NLM [36]. These features include expanded
locking facilities, which provide some measure of interclient locking facilities, which provide some measure of interclient
exclusion, but the state is also valuable to providing other useful exclusion, but the state is also valuable to providing other useful
features not readily providable using a stateless model. There are features not readily providable using a stateless model. There are
three components to making this state manageable: three components to making this state manageable:
o Clear division between client and server o Clear division between client and server
o Ability to reliably detect inconsistency in state between client o Ability to reliably detect inconsistency in state between client
and server and server
o Simple and robust recovery mechanisms o Simple and robust recovery mechanisms
In this model, the server owns the state information. The client In this model, the server owns the state information. The client
requests changes in locks and the server responds with the changes requests changes in locks and the server responds with the changes
made. Non-client-initiated changes in locking state are infrequent made. Non-client-initiated changes in locking state are infrequent
and the client receives prompt notification of them and can adjust and the client receives prompt notification of them and can adjust
its view of the locking state to reflect the server's changes. its view of the locking state to reflect the server's changes.
skipping to change at page 146, line 44 skipping to change at page 148, line 40
With the exception of special stateids, to be discussed later, each With the exception of special stateids, to be discussed later, each
stateid represents locking objects of one of a set of types defined stateid represents locking objects of one of a set of types defined
by the NFSv4.1 protocol. Note that in all these cases, where we by the NFSv4.1 protocol. Note that in all these cases, where we
speak of guarantee, there is always an implied codicil that any speak of guarantee, there is always an implied codicil that any
situation such as a client reboot, or lock revocation, allows the situation such as a client reboot, or lock revocation, allows the
guarantee to be voided. guarantee to be voided.
o Stateids may represent opens of files. o Stateids may represent opens of files.
Each stateid in this case represents the open for a given Each stateid in this case represents the open for a given
clientid/openowner/filehandle triple. Such stateids are subject clientid/open-owner/filehandle triple. Such stateids are subject
to change (with consequent bumping of the seqid) in response to to change (with consequent bumping of the seqid) in response to
OPENs that result in upgrade and OPEN_DOWNGRADE operations. OPENs that result in upgrade and OPEN_DOWNGRADE operations.
o Stateids may represent sets of byte-range locks. o Stateids may represent sets of byte-range locks.
All locks held on a particular file by a particular owner and all All locks held on a particular file by a particular owner and all
gotten under the aegis of a particular open file are associated gotten under the aegis of a particular open file are associated
with a single stateid with the seqid being bumped as LOCK and with a single stateid with the seqid being bumped as LOCK and
LOCKU operation affect that set of locks. LOCKU operation affect that set of locks.
skipping to change at page 147, line 51 skipping to change at page 149, line 49
or layouts), for a specific file or directory, and sharing the same or layouts), for a specific file or directory, and sharing the same
ownership characteristics. The seqid designates a specific instance ownership characteristics. The seqid designates a specific instance
of such a set of locks, and is incremented to indicate changes in of such a set of locks, and is incremented to indicate changes in
such a set of locks, either by the addition or deletion of locks from such a set of locks, either by the addition or deletion of locks from
the set, a change in the byte-range they apply to, or an upgrade or the set, a change in the byte-range they apply to, or an upgrade or
downgrade in the type of one or more locks. downgrade in the type of one or more locks.
When such a set of locks is first created the server returns a When such a set of locks is first created the server returns a
stateid with seqid value of one. On subsequent operations which stateid with seqid value of one. On subsequent operations which
modify the set of locks the server is required to increment the seqid modify the set of locks the server is required to increment the seqid
field by one (1) whenever it returns a stateid for the same state field by one (1) whenever it returns a stateid for the same state-
owner/file/type combination and there is some change in the set of owner/file/type combination and there is some change in the set of
locks actually designated. In this case the server will return a locks actually designated. In this case the server will return a
stateid with an other field the same as previously used for that stateid with an other field the same as previously used for that
state owner/file/type combination, with an incremented seqid field. state-owner/file/type combination, with an incremented seqid field.
This pattern continues until the seqid is incremented past This pattern continues until the seqid is incremented past
NFS4_UINT32_MAX, and one (not zero) is the next seqid value. NFS4_UINT32_MAX, and one (not zero) is the next seqid value.
The purpose of the incrementing of the seqid is to allow the server The purpose of the incrementing of the seqid is to allow the server
to communicate to the client the order in which operations that to communicate to the client the order in which operations that
modified locking state associated with a stateid have been processed modified locking state associated with a stateid have been processed
and to make it possible for the client to send requests that are and to make it possible for the client to send requests that are
conditional on the set of locks not having changed since the stateid conditional on the set of locks not having changed since the stateid
in question was returned. in question was returned.
skipping to change at page 151, line 12 skipping to change at page 153, line 12
o An indication of the type of stateid (open, record lock, file o An indication of the type of stateid (open, record lock, file
delegation, directory delegation, layout). delegation, directory delegation, layout).
o The last "seqid" value returned corresponding to the current o The last "seqid" value returned corresponding to the current
"other" value. "other" value.
o An indication of the current status of the locks associated with o An indication of the current status of the locks associated with
this stateid. In particular, whether these have been revoked and this stateid. In particular, whether these have been revoked and
if so, for what reason. if so, for what reason.
With this information, an incoming stateid can be validated and and With this information, an incoming stateid can be validated and the
the appropriate error returned when necessary. Special and non- appropriate error returned when necessary. Special and non-special
special stateids are handled separately. (See Section 8.2.3 for a stateids are handled separately. (See Section 8.2.3 for a discussion
discussion of special stateids). of special stateids).
Note that stateids are implicitly qualified by the current client ID, Note that stateids are implicitly qualified by the current client ID,
as derived the the client ID associated with the current session. as derived from the client ID associated with the current session.
Note however, that the semantics of the session will prevent stateids Note however, that the semantics of the session will prevent stateids
associated with a previous client or server instance from being associated with a previous client or server instance from being
analyzed by this procedure. analyzed by this procedure.
If server restart has resulted in an invalid client ID or a sessionid If server restart has resulted in an invalid client ID or a sessionid
which is invalid, SEQUENCE will return an error and the operation which is invalid, SEQUENCE will return an error and the operation
that takes a stateid as an argument will never be processed. that takes a stateid as an argument will never be processed.
If there has been a server restart where there is a persistent If there has been a server restart where there is a persistent
session, and all leased state has been lost, then the session in session, and all leased state has been lost, then the session in
skipping to change at page 151, line 51 skipping to change at page 153, line 51
NFS4ERR_BAD_STATEID is returned. NFS4ERR_BAD_STATEID is returned.
o If the special stateid is one designating the current stateid, and o If the special stateid is one designating the current stateid, and
there is a current stateid, then the current stateid is there is a current stateid, then the current stateid is
substituted for the special stateid and the checks appropriate to substituted for the special stateid and the checks appropriate to
non-special stateids in performed. non-special stateids in performed.
o If the combination is valid in general but is not appropriate to o If the combination is valid in general but is not appropriate to
the context in which the stateid is used (e.g. an all-zero stateid the context in which the stateid is used (e.g. an all-zero stateid
is used when an open stateid is required in a LOCK operation), the is used when an open stateid is required in a LOCK operation), the
the error NFS4ERR_BAD_STATEID is also returned. error NFS4ERR_BAD_STATEID is also returned.
o Otherwise, the check is completed and the special stateid is o Otherwise, the check is completed and the special stateid is
accepted as valid. accepted as valid.
When a stateid is being tested, and the "other" field is neither all When a stateid is being tested, and the "other" field is neither all
zeros or all ones, the following procedure could be used to validate zeros or all ones, the following procedure could be used to validate
an incoming stateid and return an appropriate error, when necessary, an incoming stateid and return an appropriate error, when necessary,
assuming that the "other" field would be divided into a table index assuming that the "other" field would be divided into a table index
and an entry generation. and an entry generation.
skipping to change at page 153, line 11 skipping to change at page 155, line 11
information associated with that particular type of stateid, such information associated with that particular type of stateid, such
as the associated set of locks, such as open-owner and lock-owner as the associated set of locks, such as open-owner and lock-owner
information, as well as information on the specific locks, such as information, as well as information on the specific locks, such as
open modes and byte ranges. open modes and byte ranges.
8.2.5. Stateid Use for I/O Operations 8.2.5. Stateid Use for I/O Operations
Clients performing I/O operations (and SETATTR's modifying the file Clients performing I/O operations (and SETATTR's modifying the file
size), need to select an appropriate stateid based on the locks size), need to select an appropriate stateid based on the locks
(including opens and delegations) held by the client and the various (including opens and delegations) held by the client and the various
types of lock owners issuing the I/O requests. types of state-owners issuing the I/O requests.
The following rules, applied in order of decreasing priority, govern The following rules, applied in order of decreasing priority, govern
the selection of the appropriate stateid. Note that the rules are the selection of the appropriate stateid. Note that the rules are
slightly different in the case of I/O to data servers when file slightly different in the case of I/O to data servers when file
layouts are being used. (See Section 13.9.1). layouts are being used. (See Section 13.9.1).
o If the client holds a delegation for the file in question, the o If the client holds a delegation for the file in question, the
delegation stateid should be used. delegation stateid should be used.
o Otherwise, if the lockowner corresponding entity (e.g. process) o Otherwise, if the lock-owner corresponding entity (e.g. process)
issuing the I/O has a lock stateid for the associated open file, issuing the I/O has a lock stateid for the associated open file,
then the lock stateid for that lockowner and open file should be then the lock stateid for that lock-owner and open file should be
used. used.
o If there is no lock stateid, then the open stateid for the open o If there is no lock stateid, then the open stateid for the open
file in question is used. file in question is used.
o Finally, if none of the above apply, then a special stateid should o Finally, if none of the above apply, then a special stateid should
be used. be used.
8.3. Lease Renewal 8.3. Lease Renewal
skipping to change at page 154, line 12 skipping to change at page 156, line 12
reply cache) on an unexpired lease will result in the lease being reply cache) on an unexpired lease will result in the lease being
implicitly renewed, for the standard renewal period. implicitly renewed, for the standard renewal period.
If the client ID's lease has not expired when the server receives a If the client ID's lease has not expired when the server receives a
SEQUENCE operation, then the server MUST renew the lease. If the SEQUENCE operation, then the server MUST renew the lease. If the
client ID's lease has expired when the server receives a SEQUENCE client ID's lease has expired when the server receives a SEQUENCE
operation, the server MAY renew the lease; this depends on whether operation, the server MAY renew the lease; this depends on whether
any state was revoked as a result of the client's failure to renew any state was revoked as a result of the client's failure to renew
the lease before expiration. the lease before expiration.
Absent other activity that would renew the lease, a COMPOUND
consisting of a single SEQUENCE operation will suffice. The client
should also take communication-related delays into account and take
steps to ensure that the renewal messages actually reach the server
in good time. For example:
o When trunking is in effect, the client should consider issuing
multiple requests on different connections, in order to ensure
that renewal occurs, even in the event of blockage in the path
used for one of those connections.
o TCP retransmission delays might become so large as to approach or
exceed the length of the lease period. This may be particularly
likely when the server is unresponsive due to a reboot; see
Section 8.4.2.1
If the server renews the lease upon receiving a SEQUENCE operation, If the server renews the lease upon receiving a SEQUENCE operation,
the server MUST NOT allow the lease to expire while the rest of the the server MUST NOT allow the lease to expire while the rest of the
operations in the COMPOUND procedure's request are still executing. operations in the COMPOUND procedure's request are still executing.
Once the last operation has finished, and the response to COMPOUND Once the last operation has finished, and the response to COMPOUND
has been sent, the server MUST set the lease to expire no sooner than has been sent, the server MUST set the lease to expire no sooner than
the sum of current time and the value of the lease_time attribute. the sum of current time and the value of the lease_time attribute.
A client ID's lease can expire when it has been been at least the A client ID's lease can expire when it has been at least the lease
lease interval (lease_time) since the last lease-renewing SEQUENCE interval (lease_time) since the last lease-renewing SEQUENCE
operation was sent on any of the client ID's sessions and there must operation was sent on any of the client ID's sessions and there must
be no active COMPOUND operations on any such session. be no active COMPOUND operations on any such session.
Because the SEQUENCE operation is the basic mechanism to renew a Because the SEQUENCE operation is the basic mechanism to renew a
lease, and because if must be done at least once for each lease lease, and because if must be done at least once for each lease
period, it is the natural mechanism whereby the server will inform period, it is the natural mechanism whereby the server will inform
the client of changes in the lease status that the client needs to be the client of changes in the lease status that the client needs to be
informed of. The client should inspect the status flags informed of. The client should inspect the status flags
(sr_status_flags) returned by sequence and take the appropriate (sr_status_flags) returned by sequence and take the appropriate
action. (See Section 18.46.3 for details). action. (See Section 18.46.3 for details).
o The status bits SEQ4_STATUS_CB_PATH_DOWN and o The status bits SEQ4_STATUS_CB_PATH_DOWN and
SEQ4_STATUS_CB_PATH_DOWN_SESSION indicate problems with the SEQ4_STATUS_CB_PATH_DOWN_SESSION indicate problems with the
backchannel which the the client may need to address in order to backchannel which the client may need to address in order to
receive callback requests. receive callback requests.
o The status bits SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING and o The status bits SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING and
SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED indicates actual problems with SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED indicates actual problems with
GSS contexts for the backchannel which the client may have to GSS contexts for the backchannel which the client may have to
address to allow callback requests to be sent to it. address to allow callback requests to be sent to it.
o The status bits SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED, o The status bits SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED,
SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED, SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED,
SEQ4_STATUS_ADMIN_STATE_REVOKED, and SEQ4_STATUS_ADMIN_STATE_REVOKED, and
skipping to change at page 155, line 24 skipping to change at page 157, line 41
A critical requirement in crash recovery is that both the client and A critical requirement in crash recovery is that both the client and
the server know when the other has failed. Additionally, it is the server know when the other has failed. Additionally, it is
required that a client sees a consistent view of data across server required that a client sees a consistent view of data across server
restarts or reboots. All READ and WRITE operations that may have restarts or reboots. All READ and WRITE operations that may have
been queued within the client or network buffers must wait until the been queued within the client or network buffers must wait until the
client has successfully recovered the locks protecting the READ and client has successfully recovered the locks protecting the READ and
WRITE operations. Any that reach the server before the server can WRITE operations. Any that reach the server before the server can
safely determine that the client has recovered enough locking state safely determine that the client has recovered enough locking state
to be sure that such operations can be safely processed must be to be sure that such operations can be safely processed must be
rejected, either because the state presented is no longer valid rejected. This will happen because either:
(NFS4ERR_STALE_CLIENTID or NFS4ERR_STALE_STATEID) or because
subsequent recovery of locks may make execution of the operation o The state presented is no longer valid since it is associated with
a now invalid clientid. In this case the client will receive
either an NFS4ERR_BADSESSION or NFS4ERR_DEADSESSION error, and any
attempt to attach a new session to the existing clientid will
encounter an NFS4ERR_STALE_CLIENTID error.
o Subsequent recovery of locks may make execution of the operation
inappropriate (NFS4ERR_GRACE). inappropriate (NFS4ERR_GRACE).
8.4.1. Client Failure and Recovery 8.4.1. Client Failure and Recovery
In the event that a client fails, the server may release the client's In the event that a client fails, the server may release the client's
locks when the associated lease has expired. Conflicting locks from locks when the associated lease has expired. Conflicting locks from
another client may only be granted after this lease expiration. As another client may only be granted after this lease expiration. As
discussed in Section 8.3, when a client has not failed and re- discussed in Section 8.3, when a client has not failed and re-
establishes his lease before expiration occurs, requests for establishes his lease before expiration occurs, requests for
conflicting locks will not be granted. conflicting locks will not be granted.
skipping to change at page 157, line 16 skipping to change at page 159, line 38
example, CREATE_SESSION, DESTROY_SESSION) returns example, CREATE_SESSION, DESTROY_SESSION) returns
NFS4ERR_STALE_CLIENTID. The client MUST establish a new client NFS4ERR_STALE_CLIENTID. The client MUST establish a new client
ID (Section 8.1) and re-establish its lock state ID (Section 8.1) and re-establish its lock state
(Section 8.4.2.1). (Section 8.4.2.1).
8.4.2.1. State Reclaim 8.4.2.1. State Reclaim
When state information and the associated locks are lost as a result When state information and the associated locks are lost as a result
of a server reboot, the protocol must provide a way to cause that of a server reboot, the protocol must provide a way to cause that
state to be re-established. The approach used is to define, for most state to be re-established. The approach used is to define, for most
type of locking state (layouts are an exception), a request whose types of locking state (layouts are an exception), a request whose
function is to allow the client to re-establish on the server a lock function is to allow the client to re-establish on the server a lock
first obtained from a previous instance. Generally these requests first obtained from a previous instance. Generally these requests
are variants of the requests normally used to create locks of that are variants of the requests normally used to create locks of that
type and are referred to as "reclaim-type" requests and the process type and are referred to as "reclaim-type" requests and the process
of re-establishing such locks is referred to as "reclaiming" them. of re-establishing such locks is referred to as "reclaiming" them.
Because each client must have an opportunity to reclaim all of the Because each client must have an opportunity to reclaim all of the
locks that it has without the possibility that some other client will locks that it has without the possibility that some other client will
be granted a conflicting lock, a special period called the "grace be granted a conflicting lock, a special period called the "grace
period" is devoted to the reclaim process. During this period, period" is devoted to the reclaim process. During this period,
requests creating client IDs and sessions are handled normally, but requests creating client IDs and sessions are handled normally, but
locking requests are subject to special restrictions. Only reclaim- locking requests are subject to special restrictions. Only reclaim-
type locking requests are allowed, unless the server is able to< type locking requests are allowed, unless the server is able to
reliably determine (through state persistently maintained across reliably determine (through state persistently maintained across
reboot instances), that granting any such lock cannot possibly reboot instances), that granting any such lock cannot possibly
conflict with a subsequent reclaim. When a request is made to obtain conflict with a subsequent reclaim. When a request is made to obtain
a new lock (i.e. not a reclaim-type request) during the grace period a new lock (i.e. not a reclaim-type request) during the grace period
and such a determination cannot be made, the server must return the and such a determination cannot be made, the server must return the
error NFS4ERR_GRACE. error NFS4ERR_GRACE.
Once a session is established using the new client ID, the client Once a session is established using the new client ID, the client
will use reclaim-type locking requests (e.g. LOCK requests with will use reclaim-type locking requests (e.g. LOCK requests with
reclaim set to true and OPEN operations with a claim type of reclaim set to true and OPEN operations with a claim type of
CLAIM_PREVIOUS. See Section 9.9) to re-establish its locking state. CLAIM_PREVIOUS. See Section 9.11) to re-establish its locking state.
Once this is done, or if there is no such locking state to reclaim, Once this is done, or if there is no such locking state to reclaim,
the client sends a global RECLAIM_COMPLETE operation, i.e. one with the client sends a global RECLAIM_COMPLETE operation, i.e. one with
the one_fs argument set to false, to indicate that it has reclaimed the one_fs argument set to false, to indicate that it has reclaimed
all of the locking state that it will reclaim. Once a client sends all of the locking state that it will reclaim. Once a client sends
such a RECLAIM_COMPLETE operation, it may attempt non-reclaim locking such a RECLAIM_COMPLETE operation, it may attempt non-reclaim locking
operations, although it may get NFS4ERR_GRACE errors the operations operations, although it may get NFS4ERR_GRACE errors the operations
until the period of special handling is over. See Section 11.7.7 for until the period of special handling is over. See Section 11.7.7 for
a discussion of the analogous handling lock reclamation in the case a discussion of the analogous handling lock reclamation in the case
of file systems transitioning from server to server. of file systems transitioning from server to server.
skipping to change at page 158, line 17 skipping to change at page 160, line 40
The grace period may last until all clients who are known to possibly The grace period may last until all clients who are known to possibly
have had locks have done a global RECLAIM_COMPLETE operation, have had locks have done a global RECLAIM_COMPLETE operation,
indicating that they have finished reclaiming the locks they held indicating that they have finished reclaiming the locks they held
before the server reboot. This means that a client which has done a before the server reboot. This means that a client which has done a
RECLAIM_COMPLETE must be prepared to receive an NFS4ERR_GRACE when RECLAIM_COMPLETE must be prepared to receive an NFS4ERR_GRACE when
attempting to acquire new locks. The server is assumed to maintain attempting to acquire new locks. The server is assumed to maintain
in stable storage a list of clients who may have such locks. The in stable storage a list of clients who may have such locks. The
server may also terminate the grace period before all clients have server may also terminate the grace period before all clients have
done a global RECLAIM_COMPLETE. The server SHOULD NOT terminate the done a global RECLAIM_COMPLETE. The server SHOULD NOT terminate the
grace period before a time equal to the lease period in order to give grace period before a time equal to the lease period in order to give
clients an opportunity to find out about the server reboot. Some clients an opportunity to find out about the server reboot, as a
additional time in order to allow time to establish a new client ID result of issuing requests on associated sessions with a frequency
and session and to effect lock reclaims may be added. Note that governed by the lease time. Note that when a client does not issue
analogous rules apply to file system-specific grace periods discussed such requests (or they are issued by the client but not received by
in Section 11.7.7. the server), it is possible for the grace period to expire before the
client finds out that the server reboot has occurred.
Some additional time in order to allow time to establish a new client
ID and session and to effect lock reclaims may be added to the lease
time. Note that analogous rules apply to file system-specific grace
periods discussed in Section 11.7.7.
If the server can reliably determine that granting a non-reclaim If the server can reliably determine that granting a non-reclaim
request will not conflict with reclamation of locks by other clients, request will not conflict with reclamation of locks by other clients,
the NFS4ERR_GRACE error does not have to be returned even within the the NFS4ERR_GRACE error does not have to be returned even within the
grace period, although NFS4ERR_GRACE must always be returned to grace period, although NFS4ERR_GRACE must always be returned to
clients attempting a non-reclaim lock request before doing their own clients attempting a non-reclaim lock request before doing their own
global RECLAIM_COMPLETE. For the server to be able to service READ global RECLAIM_COMPLETE. For the server to be able to service READ
and WRITE operations during the grace period, it must again be able and WRITE operations during the grace period, it must again be able
to guarantee that no possible conflict could arise between a to guarantee that no possible conflict could arise between a
potential reclaim locking request and the READ or WRITE operation. potential reclaim locking request and the READ or WRITE operation.
skipping to change at page 159, line 24 skipping to change at page 162, line 5
requests to be processed during the grace period, it MUST determine requests to be processed during the grace period, it MUST determine
that no lock subsequently reclaimed will be rejected and that no lock that no lock subsequently reclaimed will be rejected and that no lock
subsequently reclaimed would have prevented any I/O operation subsequently reclaimed would have prevented any I/O operation
processed during the grace period. processed during the grace period.
Clients should be prepared for the return of NFS4ERR_GRACE errors for Clients should be prepared for the return of NFS4ERR_GRACE errors for
non-reclaim lock and I/O requests. In this case the client should non-reclaim lock and I/O requests. In this case the client should
employ a retry mechanism for the request. A delay (on the order of employ a retry mechanism for the request. A delay (on the order of
several seconds) between retries should be used to avoid overwhelming several seconds) between retries should be used to avoid overwhelming
the server. Further discussion of the general issue is included in the server. Further discussion of the general issue is included in
[Floyd]. The client must account for the server that is able to [37]. The client must account for the server that is able to perform
perform I/O and non-reclaim locking requests within the grace period I/O and non-reclaim locking requests within the grace period as well
as well as those that can not do so. as those that can not do so.
A reclaim-type locking request outside the server's grace period can A reclaim-type locking request outside the server's grace period can
only succeed if the server can guarantee that no conflicting lock or only succeed if the server can guarantee that no conflicting lock or
I/O request has been granted since reboot or restart. I/O request has been granted since reboot or restart.
A server may, upon restart, establish a new value for the lease A server may, upon restart, establish a new value for the lease
period. Therefore, clients should, once a new client ID is period. Therefore, clients should, once a new client ID is
established, refetch the lease_time attribute and use it as the basis established, refetch the lease_time attribute and use it as the basis
for lease renewal for the lease associated with that server. for lease renewal for the lease associated with that server.
However, the server must establish, for this restart event, a grace However, the server must establish, for this restart event, a grace
skipping to change at page 164, line 31 skipping to change at page 167, line 11
At any point, the server can revoke locks held by a client and the At any point, the server can revoke locks held by a client and the
client must be prepared for this event. When the client detects that client must be prepared for this event. When the client detects that
its locks have been or may have been revoked, the client is its locks have been or may have been revoked, the client is
responsible for validating the state information between itself and responsible for validating the state information between itself and
the server. Validating locking state for the client means that it the server. Validating locking state for the client means that it
must verify or reclaim state for each lock currently held. must verify or reclaim state for each lock currently held.
The first occasion of lock revocation is upon server reboot or The first occasion of lock revocation is upon server reboot or
restart. Note that this includes situations in which sessions are restart. Note that this includes situations in which sessions are
persistent and locking state is lost. In this class of instances, persistent and locking state is lost. In this class of instances,
the client will receive an error (NFS4ERR_STALE_STATEID on an the client will receive an error (NFS4ERR_STALE_CLIENTID on an
operation that takes a stateid as an argument or operation that takes client ID, usually as part of recovery in
NFS4ERR_STALE_CLIENTID on an operation that takes a sessionid or response to a problem with the current session) and the client will
client ID) and the client will proceed with normal crash recovery as proceed with normal crash recovery as described in the
described in the Section 8.4.2.1. Section 8.4.2.1.
The second occasion of lock revocation is the inability to renew the The second occasion of lock revocation is the inability to renew the
lease before expiration, as discussed in Section 8.4.3. While this lease before expiration, as discussed in Section 8.4.3. While this
is considered a rare or unusual event, the client must be prepared to is considered a rare or unusual event, the client must be prepared to
recover. The server is responsible for determining the precise recover. The server is responsible for determining the precise
consequences of the lease expiration, informing the client of the consequences of the lease expiration, informing the client of the
scope of the lock revocation decided upon. The client then uses the scope of the lock revocation decided upon. The client then uses the
status information provided by the server in the SEQUENCE results status information provided by the server in the SEQUENCE results
(field sr_status_flags, see Section 18.46.3) to synchronize its (field sr_status_flags, see Section 18.46.3) to synchronize its
locking state with that of the server, in order to recover. locking state with that of the server, in order to recover.
skipping to change at page 167, line 37 skipping to change at page 170, line 17
and establishes locking state on the server. and establishes locking state on the server.
9.1. Opens and Byte-range Locks 9.1. Opens and Byte-range Locks
It is assumed that manipulating a byte-range lock is rare when It is assumed that manipulating a byte-range lock is rare when
compared to READ and WRITE operations. It is also assumed that compared to READ and WRITE operations. It is also assumed that
crashes and network partitions are relatively rare. Therefore it is crashes and network partitions are relatively rare. Therefore it is
important that the READ and WRITE operations have a lightweight important that the READ and WRITE operations have a lightweight
mechanism to indicate if they possess a held lock. A byte-range lock mechanism to indicate if they possess a held lock. A byte-range lock
request contains the heavyweight information required to establish a request contains the heavyweight information required to establish a
lock and uniquely define the lock owner. lock and uniquely define the owner of the lock.
9.1.1. State-owner Definition 9.1.1. State-owner Definition
When opening a file or requesting a record lock, the client must When opening a file or requesting a record lock, the client must
specify an identifier which represents the owner of the requested specify an identifier which represents the owner of the requested
lock. This identifier is in the form of a state-owner, represented lock. This identifier is in the form of a state-owner, represented
in the protocol by a state_owner4, a variable-length opaque array in the protocol by a state_owner4, a variable-length opaque array
which, when concatenated with the current client ID uniquely defines which, when concatenated with the current client ID uniquely defines
the owner of lock managed by the client. This may be a thread id, the owner of lock managed by the client. This may be a thread id,
process id, or other unique value. process id, or other unique value.
skipping to change at page 171, line 12 skipping to change at page 173, line 41
derived form such an open is used, the server knows that the READ, derived form such an open is used, the server knows that the READ,
WRITE, or SETATTR does not conflict with the delegation, but is sent WRITE, or SETATTR does not conflict with the delegation, but is sent
under the aegis of the delegation. Even though it is possible for under the aegis of the delegation. Even though it is possible for
the server to determine from the clientid (gotten from the sessionid) the server to determine from the clientid (gotten from the sessionid)
that the client does in fact have a delegation, the server is not that the client does in fact have a delegation, the server is not
obliged to check this, so using a special stateid can result in obliged to check this, so using a special stateid can result in
avoidable recall of the delegation. avoidable recall of the delegation.
9.2. Lock Ranges 9.2. Lock Ranges
The protocol allows a lock owner to request a lock with a byte range The protocol allows a lock-owner to request a lock with a byte range
and then either upgrade, downgrade, or unlock a sub-range of the and then either upgrade, downgrade, or unlock a sub-range of the
initial lock. It is expected that this will be an uncommon type of initial lock, or a range that consists of a range which overlaps,
request. In any case, servers or server file systems may not be able fully or partially, that initial lock or a combination of a set of
to support sub-range lock semantics. In the event that a server existing locks for the same lock-owner. It is expected that this
receives a locking request that represents a sub-range of current will be an uncommon type of request. In any case, servers or server
locking state for the lock owner, the server is allowed to return the file systems may not be able to support sub-range lock semantics. In
error NFS4ERR_LOCK_RANGE to signify that it does not support sub- the event that a server receives a locking request that represents a
range lock operations. Therefore, the client should be prepared to sub-range of current locking state for the lock-owner, the server is
receive this error and, if appropriate, report the error to the allowed to return the error NFS4ERR_LOCK_RANGE to signify that it
requesting application. does not support sub-range lock operations. Therefore, the client
should be prepared to receive this error and, if appropriate, report
the error to the requesting application.
The client is discouraged from combining multiple independent locking The client is discouraged from combining multiple independent locking
ranges that happen to be adjacent into a single request since the ranges that happen to be adjacent into a single request since the
server may not support sub-range requests and for reasons related to server may not support sub-range requests and for reasons related to
the recovery of file locking state in the event of server failure. the recovery of file locking state in the event of server failure.
As discussed in Section 8.4.2 below, the server may employ certain As discussed in Section 8.4.2 below, the server may employ certain
optimizations during recovery that work effectively only when the optimizations during recovery that work effectively only when the
client's behavior during lock recovery is similar to the client's client's behavior during lock recovery is similar to the client's
locking behavior prior to server failure. locking behavior prior to server failure.
skipping to change at page 172, line 6 skipping to change at page 174, line 37
the type to WRITE_LT or WRITEW_LT. If the server does not support the type to WRITE_LT or WRITEW_LT. If the server does not support
atomic upgrade, it will return NFS4ERR_LOCK_NOTSUPP. If the upgrade atomic upgrade, it will return NFS4ERR_LOCK_NOTSUPP. If the upgrade
can be achieved without an existing conflict, the request will can be achieved without an existing conflict, the request will
succeed. Otherwise, the server will return either NFS4ERR_DENIED or succeed. Otherwise, the server will return either NFS4ERR_DENIED or
NFS4ERR_DEADLOCK. The error NFS4ERR_DEADLOCK is returned if the NFS4ERR_DEADLOCK. The error NFS4ERR_DEADLOCK is returned if the
client sent the LOCK request with the type set to WRITEW_LT and the client sent the LOCK request with the type set to WRITEW_LT and the
server has detected a deadlock. The client should be prepared to server has detected a deadlock. The client should be prepared to
receive such errors and if appropriate, report the error to the receive such errors and if appropriate, report the error to the
requesting application. requesting application.
9.4. Blocking Locks 9.4. Stateid Seqid Values and Byte-range Locks
When a lock or unlock request is done, passing a stateid, the stateid
returned has the same "other" value and a "seqid" value that is
incremented to reflect the occurrence of the lock or unlock request.
The server MUST increment the value of the "seqid" field whenever
there is any change to the locking status of any byte offset as
described by any of locks covered by the stateid. A change in
locking status includes a change from locked to unlocked or the
reverse or a change from being locked for read to being locked for
write or the reverse.
When there is no such change, as, for example when a range already
locked for write is locked again for write, the server MAY increment
the "seqid" value.
9.5. Issues with Multiple Open-owners
When the same file is opened by multiple open-owners and there are
LOCK and LOCKU requests for the same lock-owner issued through the
different open files, a situation may arise in which there are
multiple stateids representing byte-range locks for locks on the the
same file held by the same lock-owner but each assigned to a
different open-owner.
In such a situation, the locking status of each byte (i.e. whether it
is locked, the read or write mode of the lock and the lock-owner
holding the lock) MUST reflect the last LOCK or LOCKU operation done
for the lock-owner in question, independent of the stateid through
which the request was issued.
When a byte is locked by the lock-owner in question, the open-owner
to which that lock is assigned SHOULD be that of the open-owner
associated with the stateid through which the last LOCK of that byte
was done. When there is a change in the open-owner associated with
locks for the stateid through which a LOCK or LOCKU was done, the
"seqid" field of the stateid MUST be incremented, even if the
locking, in terms of lock-owners has not changed. When there is a
change to the set of locked bytes associated with a different stateid
for the same lock-owner, i.e. associated with a different open-owner,
the "seqid" value for that stateid MUST NOT be incremented.
9.6. Blocking Locks
Some clients require the support of blocking locks. While NFSv4.1 Some clients require the support of blocking locks. While NFSv4.1
provides a callback when a previously unavailable lock becomes provides a callback when a previously unavailable lock becomes
available, this is an OPTIONAL feature and clients cannot depend on available, this is an OPTIONAL feature and clients cannot depend on
its presence. Clients need to be prepared to continually poll for its presence. Clients need to be prepared to continually poll for
the lock. This presents a fairness problem. Two new lock types are the lock. This presents a fairness problem. Two new lock types are
added, READW and WRITEW, and are used to indicate to the server that added, READW and WRITEW, and are used to indicate to the server that
the client is requesting a blocking lock. When the callback is not the client is requesting a blocking lock. When the callback is not
used, the server should maintain an ordered list of pending blocking used, the server should maintain an ordered list of pending blocking
locks. When the conflicting lock is released, the server may wait locks. When the conflicting lock is released, the server may wait
skipping to change at page 173, line 11 skipping to change at page 176, line 35
the client should take notice of this, but, since this is a hint, the client should take notice of this, but, since this is a hint,
cannot rely on a CB_NOTIFY_LOCK always being done. A client may cannot rely on a CB_NOTIFY_LOCK always being done. A client may
reasonably reduce the frequency with which it polls for a denied reasonably reduce the frequency with which it polls for a denied
lock, since the greater latency that might occur is likely to be lock, since the greater latency that might occur is likely to be
eliminated given a prompt callback, but it still needs to poll. When eliminated given a prompt callback, but it still needs to poll. When
it receives a CB_NOTIFY_LOCK it should promptly try to obtain the it receives a CB_NOTIFY_LOCK it should promptly try to obtain the
lock, but it should be aware that other clients may polling and the lock, but it should be aware that other clients may polling and the
server is under no obligation to reserve the lock for that particular server is under no obligation to reserve the lock for that particular
client. client.
9.5. Share Reservations 9.7. Share Reservations
A share reservation is a mechanism to control access to a file. It A share reservation is a mechanism to control access to a file. It
is a separate and independent mechanism from record locking. When a is a separate and independent mechanism from record locking. When a
client opens a file, it sends an OPEN operation to the server client opens a file, it sends an OPEN operation to the server
specifying the type of access required (READ, WRITE, or BOTH) and the specifying the type of access required (READ, WRITE, or BOTH) and the
type of access to deny others (deny NONE, READ, WRITE, or BOTH). If type of access to deny others (deny NONE, READ, WRITE, or BOTH). If
the OPEN fails the client will fail the application's open request. the OPEN fails the client will fail the application's open request.
Pseudo-code definition of the semantics: Pseudo-code definition of the semantics:
skipping to change at page 174, line 5 skipping to change at page 177, line 31
const OPEN4_SHARE_ACCESS_READ = 0x00000001; const OPEN4_SHARE_ACCESS_READ = 0x00000001;
const OPEN4_SHARE_ACCESS_WRITE = 0x00000002; const OPEN4_SHARE_ACCESS_WRITE = 0x00000002;
const OPEN4_SHARE_ACCESS_BOTH = 0x00000003; const OPEN4_SHARE_ACCESS_BOTH = 0x00000003;
const OPEN4_SHARE_DENY_NONE = 0x00000000; const OPEN4_SHARE_DENY_NONE = 0x00000000;
const OPEN4_SHARE_DENY_READ = 0x00000001; const OPEN4_SHARE_DENY_READ = 0x00000001;
const OPEN4_SHARE_DENY_WRITE = 0x00000002; const OPEN4_SHARE_DENY_WRITE = 0x00000002;
const OPEN4_SHARE_DENY_BOTH = 0x00000003; const OPEN4_SHARE_DENY_BOTH = 0x00000003;
9.6. OPEN/CLOSE Operations 9.8. OPEN/CLOSE Operations
To provide correct share semantics, a client MUST use the OPEN To provide correct share semantics, a client MUST use the OPEN
operation to obtain the initial filehandle and indicate the desired operation to obtain the initial filehandle and indicate the desired
access and what access, if any, to deny. Even if the client intends access and what access, if any, to deny. Even if the client intends
to use a special stateid for anonymous state or read bypass, it must to use a special stateid for anonymous state or read bypass, it must
still obtain the filehandle for the regular file with the OPEN still obtain the filehandle for the regular file with the OPEN
operation so the appropriate share semantics can be applied. For operation so the appropriate share semantics can be applied. For
clients that do not have a deny mode built into their open clients that do not have a deny mode built into their open
programming interfaces, deny equal to NONE should be used. programming interfaces, deny equal to NONE should be used.
skipping to change at page 174, line 36 skipping to change at page 178, line 13
CLOSE. CLOSE.
The LOOKUP operation will return a filehandle without establishing The LOOKUP operation will return a filehandle without establishing
any lock state on the server. Without a valid stateid, the server any lock state on the server. Without a valid stateid, the server
will assume the client has the least access. For example, a file will assume the client has the least access. For example, a file
opened with deny READ/WRITE using a filehandle obtained through opened with deny READ/WRITE using a filehandle obtained through
LOOKUP could only be read using the special read bypass stateid and LOOKUP could only be read using the special read bypass stateid and
could not be written at all because it would not have a valid stateid could not be written at all because it would not have a valid stateid
and the special anonymous stateid would not be allowed access. and the special anonymous stateid would not be allowed access.
9.7. Open Upgrade and Downgrade 9.9. Open Upgrade and Downgrade
When an OPEN is done for a file and the open-owner for which the open When an OPEN is done for a file and the open-owner for which the open
is being done already has the file open, the result is to upgrade the is being done already has the file open, the result is to upgrade the
open file status maintained on the server to include the access and open file status maintained on the server to include the access and
deny bits specified by the new OPEN as well as those for the existing deny bits specified by the new OPEN as well as those for the existing
OPEN. The result is that there is one open file, as far as the OPEN. The result is that there is one open file, as far as the
protocol is concerned, and it includes the union of the access and protocol is concerned, and it includes the union of the access and
deny bits for all of the OPEN requests completed. The open is deny bits for all of the OPEN requests completed. The open is
represented by s single stateid whose "other" values matches that of represented by s single stateid whose "other" values matches that of
the original open. Only a single CLOSE will be done to reset the the original open, and whose "seqid" value is incremented to reflect
effects of both OPENs. The client may use the stateid returned by the occurrence of the upgrade. The increment is required in cases in
the OPEN effecting the upgrade or with a stateid sharing the same which the "upgrade" results in no change to the open mode (e.g. an
"other" field and a seqid of zero, although care needs to be taken as OPEN is done for read when the existing open file is opened for read-
far as upgrades which happen while the CLOSE is pending. Note that write). Only a single CLOSE will be done to reset the effects of
the client, when issuing the OPEN, may not know that the same file is both OPENs. The client may use the stateid returned by the OPEN
in fact being opened. The above only applies if both OPENs result in effecting the upgrade or with a stateid sharing the same "other"
field and a seqid of zero, although care needs to be taken as far as
upgrades which happen while the CLOSE is pending. Note that the
client, when issuing the OPEN, may not know that the same file is in
fact being opened. The above only applies if both OPENs result in
the OPENed object being designated by the same filehandle. the OPENed object being designated by the same filehandle.
When the server chooses to export multiple filehandles corresponding When the server chooses to export multiple filehandles corresponding
to the same file object and returns different filehandles on two to the same file object and returns different filehandles on two
different OPENs of the same file object, the server MUST NOT "OR" different OPENs of the same file object, the server MUST NOT "OR"
together the access and deny bits and coalesce the two open files. together the access and deny bits and coalesce the two open files.
Instead the server must maintain separate OPENs with separate Instead the server must maintain separate OPENs with separate
stateids and will require separate CLOSEs to free them. stateids and will require separate CLOSEs to free them.
When multiple open files on the client are merged into a single open When multiple open files on the client are merged into a single open
file object on the server, the close of one of the open files (on the file object on the server, the close of one of the open files (on the
client) may necessitate change of the access and deny status of the client) may necessitate change of the access and deny status of the
open file on the server. This is because the union of the access and open file on the server. This is because the union of the access and
deny bits for the remaining opens may be smaller (i.e. a proper deny bits for the remaining opens may be smaller (i.e. a proper
subset) than previously. The OPEN_DOWNGRADE operation is used to subset) than previously. The OPEN_DOWNGRADE operation is used to
make the necessary change and the client should use it to update the make the necessary change and the client should use it to update the
server so that share reservation requests by other clients are server so that share reservation requests by other clients are
handled properly. handled properly. The stateid returned has the same "other" field as
that passed to the server. The "seqid" value in the returned stateid
MUST be incremented, even is situation in which there is no change
the access and deny bits for the file.
9.8. Parallel OPENs 9.10. Parallel OPENs
Unlike the case of NFSv4.0, in which OPEN operations for the same Unlike the case of NFSv4.0, in which OPEN operations for the same
openowner are inherently serialized because of the owner-based seqid, open-owner are inherently serialized because of the owner-based
multiple OPENs for the same openowner may be done in parallel. When seqid, multiple OPENs for the same open-owner may be done in
clients do this, they may encounter situations in which, because of parallel. When clients do this, they may encounter situations in
the existence of hard links, two OPEN operations may turn out to open which, because of the existence of hard links, two OPEN operations
the same file, with a later OPEN performed being an upgrade of the may turn out to open the same file, with a later OPEN performed being
first, with this fact only visible to the client once the operations an upgrade of the first, with this fact only visible to the client
complete. once the operations complete.
In this situation, clients may determine the order in which the OPENs In this situation, clients may determine the order in which the OPENs
were performed by examining the stateids returned by the OPENs. were performed by examining the stateids returned by the OPENs.
Stateids that share a common value of the the "other" field can be Stateids that share a common value of the "other" field can be
recognized as having opened the same file, with the order of the recognized as having opened the same file, with the order of the
operations determinable from the order of the "seqid" fields, mod any operations determinable from the order of the "seqid" fields, mod any
possible wraparound of the 32-bit field. possible wraparound of the 32-bit field.
When the possibility exists that the client will send multiple OPENs When the possibility exists that the client will send multiple OPENs
for the same openowner in parallel, it may be the case that an open for the same open-owner in parallel, it may be the case that an open
upgrade may happen without the client knowing beforehand that this upgrade may happen without the client knowing beforehand that this
could happen. Because of this possibility, CLOSEs and could happen. Because of this possibility, CLOSEs and
OPEN_DOWNGRADEs, should generally be sent with a non-zero seqid in OPEN_DOWNGRADEs, should generally be sent with a non-zero seqid in
the stateid, to avoid the possibility that the status change the stateid, to avoid the possibility that the status change
associated with an open upgrade is not inadvertently lost. associated with an open upgrade is not inadvertently lost.
9.9. Reclaim of Open and Byte-range Locks 9.11. Reclaim of Open and Byte-range Locks
Special forms of the LOCK and OPEN operations are provided when it is Special forms of the LOCK and OPEN operations are provided when it is
necessary to re-establish byte-range locks or opens after a server necessary to re-establish byte-range locks or opens after a server
failure. failure.
o To reclaim existing opens, an OPEN operation is performed using a o To reclaim existing opens, an OPEN operation is performed using a
CLAIM_PREVIOUS. Because the client, in this type of situation, CLAIM_PREVIOUS. Because the client, in this type of situation,
will have already opened the file and have the filehandle of the will have already opened the file and have the filehandle of the
target file, this operation requires that the current filehandle target file, this operation requires that the current filehandle
be the target file, rather than a directory and no file name is be the target file, rather than a directory and no file name is
skipping to change at page 178, line 21 skipping to change at page 181, line 42
responsibilities when another client engages in sharing of a responsibilities when another client engages in sharing of a
delegated file. delegated file.
A delegation is passed from the server to the client, specifying the A delegation is passed from the server to the client, specifying the
object of the delegation and the type of delegation. There are object of the delegation and the type of delegation. There are
different types of delegations but each type contains a stateid to be different types of delegations but each type contains a stateid to be
used to represent the delegation when performing operations that used to represent the delegation when performing operations that
depend on the delegation. This stateid is similar to those depend on the delegation. This stateid is similar to those
associated with locks and share reservations but differs in that the associated with locks and share reservations but differs in that the
stateid for a delegation is associated with a client ID and may be stateid for a delegation is associated with a client ID and may be
used on behalf of all the open_owners for the given client. A used on behalf of all the open-owners for the given client. A
delegation is made to the client as a whole and not to any specific delegation is made to the client as a whole and not to any specific
process or thread of control within it. process or thread of control within it.
The backchannel is established by CREATE_SESSION and The backchannel is established by CREATE_SESSION and
BIND_CONN_TO_SESSION, and the client is required to maintain it. BIND_CONN_TO_SESSION, and the client is required to maintain it.
Because the backchannel may be down, even temporarily, correct Because the backchannel may be down, even temporarily, correct
protocol operation does not depend on them. Preliminary testing of protocol operation does not depend on them. Preliminary testing of
backchannel functionality by means of a CB_COMPOUND procedure with a backchannel functionality by means of a CB_COMPOUND procedure with a
single operation, CB_SEQUENCE, can be used to check the continuity of single operation, CB_SEQUENCE, can be used to check the continuity of
the backchannel. A server avoids delegating responsibilities until the backchannel. A server avoids delegating responsibilities until
skipping to change at page 187, line 48 skipping to change at page 191, line 22
There are two types of open delegations, read and write. A read open There are two types of open delegations, read and write. A read open
delegation allows a client to handle, on its own, requests to open a delegation allows a client to handle, on its own, requests to open a
file for reading that do not deny read access to others. Multiple file for reading that do not deny read access to others. Multiple
read open delegations may be outstanding simultaneously and do not read open delegations may be outstanding simultaneously and do not
conflict. A write open delegation allows the client to handle, on conflict. A write open delegation allows the client to handle, on
its own, all opens. Only one write open delegation may exist for a its own, all opens. Only one write open delegation may exist for a
given file at a given time and it is inconsistent with any read open given file at a given time and it is inconsistent with any read open
delegations. delegations.
When a client has a read open delegation, it is assured that neither When a client has a read open delegation, it is assured that neither
the contents, the attributes, nor the names of any links to the file the contents, the attributes (with the exception of time_access), nor
will change without its knowledge, so long as the delegation is held. the names of any links to the file will change without its knowledge,
When a client has a write open delegation, it may modify the file so long as the delegation is held. When a client has a write open
data locally since no other client will be accessing the file's data. delegation, it may modify the file data locally since no other client
The client holding a write delegation may only locally affect file will be accessing the file's data. The client holding a write
attributes which are intimately connected with the file data: size, delegation may only locally affect file attributes which are
time_modify, change. Changes to other attributes must be reflected intimately connected with the file data: size, change, time_access,
time_metadata, and time_modify. to other attributes must be reflected
on the server. on the server.
When a client has an open delegation, it does not send OPENs or When a client has an open delegation, it does not send OPENs or
CLOSEs to the server but updates the appropriate status internally. CLOSEs to the server but updates the appropriate status internally.
For a read open delegation, opens that cannot be handled locally For a read open delegation, opens that cannot be handled locally
(opens for write or that deny read access) must be sent to the (opens for write or that deny read access) must be sent to the
server. server.
When an open delegation is made, the response to the OPEN contains an When an open delegation is made, the response to the OPEN contains an
open delegation structure which specifies the following: open delegation structure which specifies the following:
skipping to change at page 188, line 28 skipping to change at page 191, line 52
o space limitation information to control flushing of data on close o space limitation information to control flushing of data on close
(write open delegation only, see Section 10.4.1. (write open delegation only, see Section 10.4.1.
o an nfsace4 specifying read and write permissions o an nfsace4 specifying read and write permissions
o a stateid to represent the delegation for READ and WRITE o a stateid to represent the delegation for READ and WRITE
The delegation stateid is separate and distinct from the stateid for The delegation stateid is separate and distinct from the stateid for
the OPEN proper. The standard stateid, unlike the delegation the OPEN proper. The standard stateid, unlike the delegation
stateid, is associated with a particular lock_owner and will continue stateid, is associated with a particular lock-owner and will continue
to be valid after the delegation is recalled and the file remains to be valid after the delegation is recalled and the file remains
open. open.
When a request internal to the client is made to open a file and open When a request internal to the client is made to open a file and open
delegation is in effect, it will be accepted or rejected solely on delegation is in effect, it will be accepted or rejected solely on
the basis of the following conditions. Any requirement for other the basis of the following conditions. Any requirement for other
checks to be made by the delegate should result in open delegation checks to be made by the delegate should result in open delegation
being denied so that the checks can be made by the server itself. being denied so that the checks can be made by the server itself.
o The access and deny bits for the request and the file as described o The access and deny bits for the request and the file as described
in Section 9.5. in Section 9.7.
o The read and write permissions as determined below. o The read and write permissions as determined below.
The nfsace4 passed with delegation can be used to avoid frequent The nfsace4 passed with delegation can be used to avoid frequent
ACCESS calls. The permission check should be as follows: ACCESS calls. The permission check should be as follows:
o If the nfsace4 indicates that the open may be done, then it should o If the nfsace4 indicates that the open may be done, then it should
be granted without reference to the server. be granted without reference to the server.
o If the nfsace4 indicates that the open may not be done, then an o If the nfsace4 indicates that the open may not be done, then an
skipping to change at page 197, line 47 skipping to change at page 201, line 34
client via a CB_PUSH_DELEG operation. When this happens, open files client via a CB_PUSH_DELEG operation. When this happens, open files
for the same filehandle become subordinate to the new delegation at for the same filehandle become subordinate to the new delegation at
the point at which the delegation is delivered , just as if they had the point at which the delegation is delivered , just as if they had
been created using an OPEN of type CLAIM_DELEGATE_CUR. Similarly, been created using an OPEN of type CLAIM_DELEGATE_CUR. Similarly,
for existing byte-range locks subordinate to an open. for existing byte-range locks subordinate to an open.
10.5. Data Caching and Revocation 10.5. Data Caching and Revocation
When locks and delegations are revoked, the assumptions upon which When locks and delegations are revoked, the assumptions upon which
successful caching depend are no longer guaranteed. For any locks or successful caching depend are no longer guaranteed. For any locks or
share reservations that have been revoked, the corresponding owner share reservations that have been revoked, the corresponding state-
needs to be notified. This notification includes applications with a owner needs to be notified. This notification includes applications
file open that has a corresponding delegation which has been revoked. with a file open that has a corresponding delegation which has been
Cached data associated with the revocation must be removed from the revoked. Cached data associated with the revocation must be removed
client. In the case of modified data existing in the client's cache, from the client. In the case of modified data existing in the
that data must be removed from the client without it being written to client's cache, that data must be removed from the client without it
the server. As mentioned, the assumptions made by the client are no being written to the server. As mentioned, the assumptions made by
longer valid at the point when a lock or delegation has been revoked. the client are no longer valid at the point when a lock or delegation
For example, another client may have been granted a conflicting lock has been revoked. For example, another client may have been granted
after the revocation of the lock at the first client. Therefore, the a conflicting lock after the revocation of the lock at the first
data within the lock range may have been modified by the other client. Therefore, the data within the lock range may have been
client. Obviously, the first client is unable to guarantee to the modified by the other client. Obviously, the first client is unable
application what has occurred to the file in the case of revocation. to guarantee to the application what has occurred to the file in the
case of revocation.
Notification to a lock owner will in many cases consist of simply Notification to a state-owner will in many cases consist of simply
returning an error on the next and all subsequent READs/WRITEs to the returning an error on the next and all subsequent READs/WRITEs to the
open file or on the close. Where the methods available to a client open file or on the close. Where the methods available to a client
make such notification impossible because errors for certain make such notification impossible because errors for certain
operations may not be returned, more drastic action such as signals operations may not be returned, more drastic action such as signals
or process termination may be appropriate. The justification for or process termination may be appropriate. The justification for
this is that an invariant for which an application depends on may be this is that an invariant for which an application depends on may be
violated. Depending on how errors are typically treated for the violated. Depending on how errors are typically treated for the
client operating environment, further levels of notification client operating environment, further levels of notification
including logging, console messages, and GUI pop-ups may be including logging, console messages, and GUI pop-ups may be
appropriate. appropriate.
skipping to change at page 216, line 8 skipping to change at page 219, line 42
Referrals provide a way of placing a file system in a location within Referrals provide a way of placing a file system in a location within
the namespace essentially without respect to its physical location on the namespace essentially without respect to its physical location on
a given server. This allows a single server or a set of servers to a given server. This allows a single server or a set of servers to
present a multi-server namespace that encompasses file systems present a multi-server namespace that encompasses file systems
located on multiple servers. Some likely uses of this include located on multiple servers. Some likely uses of this include
establishment of site-wide or organization-wide namespaces, or even establishment of site-wide or organization-wide namespaces, or even
knitting such together into a truly global namespace. knitting such together into a truly global namespace.
Referrals occur when a client determines, upon first referencing a Referrals occur when a client determines, upon first referencing a
position in the current namespace, that it is part of a new file position in the current namespace, that it is part of a new file
system and that that file system is absent. When this occurs, system and that the file system is absent. When this occurs,
typically by receiving the error NFS4ERR_MOVED, the actual location typically by receiving the error NFS4ERR_MOVED, the actual location
or locations of the file system can be determined by fetching the or locations of the file system can be determined by fetching the
fs_locations or fs_locations_info attribute. fs_locations or fs_locations_info attribute.
The locations-related attribute may designate a single file system The locations-related attribute may designate a single file system
location or multiple file system locations, to be selected based on location or multiple file system locations, to be selected based on
the needs of the client. The server, in the fs_locations_info the needs of the client. The server, in the fs_locations_info
attribute may specify priorities to be associated with various file attribute may specify priorities to be associated with various file
system location choices. The server may assign different priorities system location choices. The server may assign different priorities
to different locations as reported to individual clients, in order to to different locations as reported to individual clients, in order to
skipping to change at page 221, line 30 skipping to change at page 225, line 17
When the conditions above hold, in either of the following two cases, When the conditions above hold, in either of the following two cases,
the client may use the two file system instances simultaneously. the client may use the two file system instances simultaneously.
o The fs_locations_info attribute does not contain separate per- o The fs_locations_info attribute does not contain separate per-
network-address entries for file systems instances at the distinct network-address entries for file systems instances at the distinct
network addresses. This includes the case in which the network addresses. This includes the case in which the
fs_locations_info attribute is unavailable. In this case, the fs_locations_info attribute is unavailable. In this case, the
fact that the two server addresses connect to the same server (as fact that the two server addresses connect to the same server (as
indicated by the two addresses sharing the same the so_major_id indicated by the two addresses sharing the same the so_major_id
value and subsequently confirmed as described in in value and subsequently confirmed as described in Section 2.10.4)
Section 2.10.4) justifies simultaneous use and there is no justifies simultaneous use and there is no fs_locations_info
fs_locations_info attribute information contradicting that. attribute information contradicting that.
o The fs_locations_info attribute indicates that two file system o The fs_locations_info attribute indicates that two file system
instances belong to the same _simultaneous-use_ class. instances belong to the same _simultaneous-use_ class.
In this case, the client may use both file system instances In this case, the client may use both file system instances
simultaneously, as representations of the same file system, whether simultaneously, as representations of the same file system, whether
that happens because the two network addresses connect to the same that happens because the two network addresses connect to the same
physical server or because different servers connect to clustered physical server or because different servers connect to clustered
file systems and export their data in common. When simultaneous use file systems and export their data in common. When simultaneous use
is in effect, any change made to one file system instance must be is in effect, any change made to one file system instance must be
skipping to change at page 252, line 41 skipping to change at page 256, line 18
In an environment in which multiple copies of the same basic set of In an environment in which multiple copies of the same basic set of
data are available, information regarding the particular source of data are available, information regarding the particular source of
such data and the relationships among different copies can be very such data and the relationships among different copies can be very
helpful in providing consistent data to applications. helpful in providing consistent data to applications.
enum fs4_status_type { enum fs4_status_type {
STATUS4_FIXED = 1, STATUS4_FIXED = 1,
STATUS4_UPDATED = 2, STATUS4_UPDATED = 2,
STATUS4_VERSIONED = 3, STATUS4_VERSIONED = 3,
STATUS4_WRITABLE = 4, STATUS4_WRITABLE = 4,
STATUS4_ABSENT = 5 STATUS4_REFERRAL = 5
}; };
struct fs4_status { struct fs4_status {
bool fss_absent;
fs4_status_type fss_type; fs4_status_type fss_type;
utf8str_cs fss_source; utf8str_cs fss_source;
utf8str_cs fss_current; utf8str_cs fss_current;
int32_t fss_age; int32_t fss_age;
nfstime4 fss_version; nfstime4 fss_version;
}; };
The boolean fsstat_absent indicates whether the file system is
currently absent. This value will be set if the file system was The boolean fss_absent indicates whether the file system is currently
previously present and becomes absent, or if the file system has absent. This value will be set if the file system was previously
never been present and the type is STATUS4_REFERRAL. When this present and becomes absent, or if the file system has never been
boolean is set and the type is not STATUS4_REFERRAL, the remaining present and the type is STATUS4_REFERRAL. When this boolean is set
information in the fs4_status reflects that last valid when the file and the type is not STATUS4_REFERRAL, the remaining information in
system was present. the fs4_status reflects that last valid when the file system was
present.
The type value indicates the kind of file system image represented. The type value indicates the kind of file system image represented.
This is of particular importance when using the version values to This is of particular importance when using the version values to
determine appropriate succession of file system images. When determine appropriate succession of file system images. When
fsstat_absent is set, and the file system was previously present, the fss_absent is set, and the file system was previously present, the
type reflected is that when the file was last present. Five types type reflected is that when the file was last present. Five types
are distinguished: are distinguished:
o STATUS4_FIXED which indicates a read-only image in the sense that o STATUS4_FIXED which indicates a read-only image in the sense that
it will never change. The possibility is allowed that, as a it will never change. The possibility is allowed that, as a
result of migration or switch to a different image, changed data result of migration or switch to a different image, changed data
can be accessed, but within the confines of this instance, no can be accessed, but within the confines of this instance, no
change is allowed. The client can use this fact to cache change is allowed. The client can use this fact to cache
aggressively. aggressively.
skipping to change at page 257, line 29 skipping to change at page 261, line 6
The NFSv4.1 pNFS feature has been structured to allow for a variety The NFSv4.1 pNFS feature has been structured to allow for a variety
of storage protocols to be defined and used. As noted in the diagram of storage protocols to be defined and used. As noted in the diagram
above, the storage protocol is the method used by the client to store above, the storage protocol is the method used by the client to store
and retrieve data directly from the storage devices. The NFSv4.1 and retrieve data directly from the storage devices. The NFSv4.1
protocol directly defines one storage protocol, the NFSv4.1 storage protocol directly defines one storage protocol, the NFSv4.1 storage
type, and its use. type, and its use.
Examples of other storage protocols that could be used with NFSv4.1's Examples of other storage protocols that could be used with NFSv4.1's
pNFS are: pNFS are:
o Block/volume protocols such as iSCSI ([36]), and FCP ([37]). The o Block/volume protocols such as iSCSI ([38]), and FCP ([39]). The
block/volume protocol support can be independent of the addressing block/volume protocol support can be independent of the addressing
structure of the block/volume protocol used, allowing more than structure of the block/volume protocol used, allowing more than
one protocol to access the same file data and enabling one protocol to access the same file data and enabling
extensibility to other block/volume protocols. extensibility to other block/volume protocols.
o Object protocols such as OSD over iSCSI or Fibre Channel [38]. o Object protocols such as OSD over iSCSI or Fibre Channel [40].
o Other storage protocols, including PVFS and other file systems o Other storage protocols, including PVFS and other file systems
that are in use in HPC environments. that are in use in HPC environments.
It is possible that various storage protocols are available to both It is possible that various storage protocols are available to both
client and server and it may be possible that a client and server do client and server and it may be possible that a client and server do
not have a matching storage protocol available to them. Because of not have a matching storage protocol available to them. Because of
this, the pNFS server MUST support normal NFSv4.1 access to any file this, the pNFS server MUST support normal NFSv4.1 access to any file
accessible by the pNFS feature; this will allow for continued accessible by the pNFS feature; this will allow for continued
interoperability between a NFSv4.1 client and server. interoperability between a NFSv4.1 client and server.
skipping to change at page 259, line 26 skipping to change at page 263, line 5
devices that hold the data. A layout is said to belong to a specific devices that hold the data. A layout is said to belong to a specific
layout type (data type layouttype4, see Section 3.3.13). The layout layout type (data type layouttype4, see Section 3.3.13). The layout
type allows for variants to handle different storage protocols, such type allows for variants to handle different storage protocols, such
as those associated with block/volume [31], object [30], and file as those associated with block/volume [31], object [30], and file
(Section 13) layout types. A metadata server, along with its control (Section 13) layout types. A metadata server, along with its control
protocol, MUST support at least one layout type. A private sub-range protocol, MUST support at least one layout type. A private sub-range
of the layout type name space is also defined. Values from the of the layout type name space is also defined. Values from the
private layout type range MAY be used for internal testing or private layout type range MAY be used for internal testing or
experimentation. experimentation.
As an example, a file layout type could be an array of tuples (e.g., As an example, layout of the file layout type could be an array of
deviceID, file_handle), along with a definition of how the data is tuples (e.g., deviceID, file_handle), along with a definition of how
stored across the devices (e.g., striping). A block/volume layout the data is stored across the devices (e.g., striping). A block/
might be an array of tuples that store <deviceID, block_number, block volume layout might be an array of tuples that store <deviceID,
count> along with information about block size and the associated block_number, block count> along with information about block size
file offset of the block number. An object layout might be an array and the associated file offset of the block number. An object layout
of tuples <deviceID, objectID> and an additional structure (i.e., the might be an array of tuples <deviceID, objectID> and an additional
aggregation map) that defines how the logical byte sequence of the structure (i.e., the aggregation map) that defines how the logical
file data is serialized into the different objects. Note that the byte sequence of the file data is serialized into the different
actual layouts are typically more complex than these simple objects. Note that the actual layouts are typically more complex
expository examples. than these simple expository examples.
Requests for pNFS-related operations will often specify a layout
type. Examples of such operations are GETDEVICEINFO and LAYOUTGET.
The response for these operations will include structures such a
device_addr4 or a layout4, each of which includes a layout type
within it. The layout type sent by the server MUST always be the
same one requested by the client. When a client sends a response
that includes a different layout type, the client SHOULD ignore the
response and behave as if the server had returned an error response.
12.2.8. Layout 12.2.8. Layout
A layout defines how a file's data is organized on one or more A layout defines how a file's data is organized on one or more
storage devices. There are many potential layout types; each of the storage devices. There are many potential layout types; each of the
layout types are differentiated by the storage protocol used to layout types are differentiated by the storage protocol used to
access data and in the aggregation scheme that lays out the file data access data and in the aggregation scheme that lays out the file data
on the underlying storage devices. A layout is precisely identified on the underlying storage devices. A layout is precisely identified
by the following tuple: <client ID, filehandle, layout type, iomode, by the following tuple: <client ID, filehandle, layout type, iomode,
range>; where filehandle refers to the filehandle of the file on the range>; where filehandle refers to the filehandle of the file on the
skipping to change at page 260, line 16 skipping to change at page 263, line 52
(i.e., the storage device/file mapping parameters differ). Note that (i.e., the storage device/file mapping parameters differ). Note that
differing iomodes do not lead to conflicting layouts. It is differing iomodes do not lead to conflicting layouts. It is
permissible for layouts with different iomodes, pertaining to the permissible for layouts with different iomodes, pertaining to the
same byte range, to be held by the same client. An example of this same byte range, to be held by the same client. An example of this
would be copy-on-write functionality for a block/volume layout type. would be copy-on-write functionality for a block/volume layout type.
12.2.9. Layout Iomode 12.2.9. Layout Iomode
The layout iomode (data type layoutiomode4, see Section 3.3.20) The layout iomode (data type layoutiomode4, see Section 3.3.20)
indicates to the metadata server the client's intent to perform indicates to the metadata server the client's intent to perform
either just READ operations (Section 18.22) or a mixture of I/O either just read operations or a mixture of I/O possibly containing
possibly containing WRITE (Section 18.32) and READ operations. For read and write operations. For certain layout types, it is useful
certain layout types, it is useful for a client to specify this for a client to specify this intent at LAYOUTGET (Section 18.43)
intent at LAYOUTGET (Section 18.43) time. For example, block/volume time. For example, block/volume based protocols, block allocation
based protocols, block allocation could occur when a READ/WRITE could occur when a READ/WRITE iomode is specified. A special
iomode is specified. A special LAYOUTIOMODE4_ANY iomode is defined LAYOUTIOMODE4_ANY iomode is defined and can only be used for
and can only be used for LAYOUTRETURN and CB_LAYOUTRECALL, not for LAYOUTRETURN and CB_LAYOUTRECALL, not for LAYOUTGET. It specifies
LAYOUTGET. It specifies that layouts pertaining to both READ and that layouts pertaining to both READ and READ/WRITE iomodes are being
READ/WRITE iomodes are being returned or recalled, respectively. returned or recalled, respectively.
A storage device may validate I/O with regards to the iomode; this is A storage device may validate I/O with regards to the iomode; this is
dependent upon storage device implementation and layout type. Thus, dependent upon storage device implementation and layout type. Thus,
if the client's layout iomode is inconsistent with the I/O being if the client's layout iomode is inconsistent with the I/O being
performed, the storage device may reject the client's I/O with an performed, the storage device may reject the client's I/O with an
error indicating a new layout with the correct I/O mode should be error indicating a new layout with the correct I/O mode should be
fetched. For example, if a client gets a layout with a READ iomode fetched. For example, if a client gets a layout with a READ iomode
and performs a WRITE to a storage device, the storage device is and performs a WRITE to a storage device, the storage device is
allowed to reject that WRITE. allowed to reject that WRITE.
skipping to change at page 263, line 43 skipping to change at page 267, line 29
which a layout is held, does not necessarily conflict with the which a layout is held, does not necessarily conflict with the
holding of the layout that describes the file being modified. holding of the layout that describes the file being modified.
Therefore, it is the requirement of the storage protocol or layout Therefore, it is the requirement of the storage protocol or layout
type that determines the necessary behavior. For example, block/ type that determines the necessary behavior. For example, block/
volume layout types require that the layout's iomode agree with the volume layout types require that the layout's iomode agree with the
type of I/O being performed. type of I/O being performed.
Depending upon the layout type and storage protocol in use, storage Depending upon the layout type and storage protocol in use, storage
device access permissions may be granted by LAYOUTGET and may be device access permissions may be granted by LAYOUTGET and may be
encoded within the type-specific layout. For an example of storage encoded within the type-specific layout. For an example of storage
device access permissions see an object based protocol such as [38]. device access permissions see an object based protocol such as [40].
If access permissions are encoded within the layout, the metadata If access permissions are encoded within the layout, the metadata
server SHOULD recall the layout when those permissions become invalid server SHOULD recall the layout when those permissions become invalid
for any reason; for example when a file becomes unwritable or for any reason; for example when a file becomes unwritable or
inaccessible to a client. Note, clients are still required to inaccessible to a client. Note, clients are still required to
perform the appropriate access operations with open, lock and access perform the appropriate access operations with open, lock and access
as described above. The degree to which it is possible for the as described above. The degree to which it is possible for the
client to circumvent these access operations and the consequences of client to circumvent these access operations and the consequences of
doing so must be clearly specified by the individual layout type doing so must be clearly specified by the individual layout type
specifications. In addition, these specifications must be clear specifications. In addition, these specifications must be clear
about the requirements and non-requirements for the checking about the requirements and non-requirements for the checking
skipping to change at page 265, line 43 skipping to change at page 269, line 28
subsequent LAYOUTGET and LAYOUTRETURN response, and in each subsequent LAYOUTGET and LAYOUTRETURN response, and in each
CB_LAYOUTRECALL request. When the client fully processes the CB_LAYOUTRECALL request. When the client fully processes the
response to a LAYOUTGET or LAYOUTRETURN, or fully processes the response to a LAYOUTGET or LAYOUTRETURN, or fully processes the
arguments of a CB_LAYOUTRECALL, it MUST use the seqid of the stateid arguments of a CB_LAYOUTRECALL, it MUST use the seqid of the stateid
of the reply from LAYOUTGET and LAYOUTRETURN, or the seqid of the of the reply from LAYOUTGET and LAYOUTRETURN, or the seqid of the
stateid in the arguments of CB_LAYOUTRECALL, on subsequent calls to stateid in the arguments of CB_LAYOUTRECALL, on subsequent calls to
LAYOUTGET or LAYOUTRETURN. The client and server use the "seqid" of LAYOUTGET or LAYOUTRETURN. The client and server use the "seqid" of
the layout stateid for the following purposes: the layout stateid for the following purposes:
o Permit the client to send parallel LAYOUTGET operations on the o Permit the client to send parallel LAYOUTGET operations on the
same file. As with parallel opens (see Section 9.8) the use of same file. As with parallel opens (see Section 9.10) the use of
the sequence ID allows a client to avoid serializing LAYOUTGET the sequence ID allows a client to avoid serializing LAYOUTGET
operations. If LAYOUTGETs were serialized, especially non- operations. If LAYOUTGETs were serialized, especially non-
overlapping LAYOUTGETs, then non-overlapping I/Os to storage overlapping LAYOUTGETs, then non-overlapping I/Os to storage
devices would in turn be effectively serialized with each other. devices would in turn be effectively serialized with each other.
In the event parallel LAYOUTGET operations are sent with a non- In the event parallel LAYOUTGET operations are sent with a non-
layout stateid (because the client does not yet have a layout layout stateid (because the client does not yet have a layout
stateid), the successful responses MUST have the same "other" stateid), the successful responses MUST have the same "other"
field in the LAYOUTSTATEID, and each response with a unique field in the LAYOUTSTATEID, and each response with a unique
"seqid", where the lowest "seqid" is one, and the highest "seqid" "seqid", where the lowest "seqid" is one, and the highest "seqid"
is equal to the count of parallel LAYOUTGET operations invoked on is equal to the count of parallel LAYOUTGET operations invoked on
skipping to change at page 274, line 39 skipping to change at page 278, line 19
The recall process can be considered completed when the final The recall process can be considered completed when the final
LAYOUTRETURN operation for the recalled range is completed. The LAYOUTRETURN operation for the recalled range is completed. The
LAYOUTRETURN uses the layout stateid (with seqid) specified in LAYOUTRETURN uses the layout stateid (with seqid) specified in
CB_LAYOUTRECALL. CB_LAYOUTRECALL.
12.5.5.2.1.3. Server Considerations 12.5.5.2.1.3. Server Considerations
Consider a race from the metadata server's point of view. The Consider a race from the metadata server's point of view. The
metadata server has sent a CB_LAYOUTRECALL and receives an metadata server has sent a CB_LAYOUTRECALL and receives an
overlapping LAYOUTGET for the same file before the LAYOUTRETURN(s) overlapping LAYOUTGET for the same file before the LAYOUTRETURN(s)
that respond to the CB_LAYOUTRECALL. There are are three cases: that respond to the CB_LAYOUTRECALL. There are three cases:
1. The client sent the LAYOUTGET before processing the 1. The client sent the LAYOUTGET before processing the
CB_LAYOUTRECALL. The "seqid" in the layout stateid of LAYOUTGET CB_LAYOUTRECALL. The "seqid" in the layout stateid of LAYOUTGET
is two less than that of the "seqid" in CB_LAYOUTRECALL. The is two less than that of the "seqid" in CB_LAYOUTRECALL. The
server returns NFS4ERR_RECALLCONFLICT to the client, which server returns NFS4ERR_RECALLCONFLICT to the client, which
indicates to the client that there is a pending recall. indicates to the client that there is a pending recall.
2. The client sent the LAYOUTGET after processing the 2. The client sent the LAYOUTGET after processing the
CB_LAYOUTRECALL, but the LAYOUTGET arrived before the CB_LAYOUTRECALL, but the LAYOUTGET arrived before the
LAYOUTRETURN and the response to CB_LAYOUTRECALL that completed LAYOUTRETURN and the response to CB_LAYOUTRECALL that completed
skipping to change at page 283, line 30 skipping to change at page 287, line 7
does this, there is no need to wait for the original storage device. does this, there is no need to wait for the original storage device.
12.8. Metadata and Storage Device Roles 12.8. Metadata and Storage Device Roles
If the same physical hardware is used to implement both a metadata If the same physical hardware is used to implement both a metadata
server and storage device, then the same hardware entity is to be server and storage device, then the same hardware entity is to be
understood to be implementing two distinct roles and it is important understood to be implementing two distinct roles and it is important
that it be clearly understood on behalf of which role the hardware is that it be clearly understood on behalf of which role the hardware is
executing at any given time. executing at any given time.
Various sub-cases can be distinguished. Two sub-cases can be distinguished.
1. The storage device uses NFSv4.1 as the storage protocol. The 1. The storage device uses NFSv4.1 as the storage protocol, i.e.
same physical hardware is used to implement both a metadata and same physical hardware is used to implement both a metadata and
data server. If an EXCHANGE_ID operation sent to the metadata data server. See Section 13.1 for a description how multiple
server has EXCHGID4_FLAG_USE_PNFS_MDS set and roles are handled.
EXCHGID4_FLAG_USE_PNFS_DS not set, the role of all sessions
derived from the client ID is metadata server-only. If an
EXCHANGE_ID operation sent to the data server has
EXCHGID4_FLAG_USE_PNFS_DS set and EXCHGID4_FLAG_USE_PNFS_MDS not
set, the role of all sessions derived from the client ID is data
server only. These assertions are true regardless whether the
network addresses of the metadata server and data server are the
same or not.
The client will use the same client owner for both the metadata
server EXCHANGE_ID and the data server EXCHANGE_ID. Since the
client sends one with EXCHGID4_FLAG_USE_PNFS_MDS set, and the
other with EXCHGID4_FLAG_USE_PNFS_DS set, the server will need to
return unique client IDs, as well as server_owners, which will
eliminate ambiguity about dual roles the same physical entity
serves.
2. The metadata and data server each return EXCHANGE_ID results with
EXCHGID4_FLAG_USE_PNFS_DS and EXCHGID4_FLAG_USE_PNFS_MDS both
set, the server_owner and server_scope results are the same, and
the client IDs are the same, and if RPCSEC_GSS is used, the
server principals are the same. As noted in Section 2.10.4 the
two servers are the same, whether they have the same network
address or not. If the pNFS server is ambiguous in its
EXCHANGE_ID results as to what role a client ID may be used for,
yet still requires the NFSv4.1 request be directed in a manner
specific to a role (e.g. a READ request for a particular offset
directed to the metadata server role might use a different offset
if the READ was intended for the data server role, if the file is
using STRIPE4_DENSE packing, see Section 13.4.4), the pNFS server
may mark the the metadata filehandle differently from the data
filehandle so that operations addressed to the metadata server
can be distinguished from those directed to the data servers.
Marking the metadata and data server filehandles differently (and
this is RECOMMENDED) is possible because the former are derived
from OPEN operations, and the latter are derived from LAYOUTGET
operations.
Note, that it may be the case that while the metadata server and
the storage device are distinct from one client's point of view,
the roles may be reversed according to another client's point of
view. For example, in the cluster file system model a metadata
server to one client, may be a data server to another client. If
NFSv4.1 is being used as the storage protocol, then pNFS servers
need to mark filehandles according to their specific roles.
3. The storage device does not use NFSv4.1 as the storage protocol, 2. The storage device does not use NFSv4.1 as the storage protocol,
and the same physical hardware is used to implement both a and the same physical hardware is used to implement both a
metadata and storage device. Whether distinct network addresses metadata and storage device. Whether distinct network addresses
are used to access metadata server and storage device is are used to access metadata server and storage device is
immaterial, because, it is always clear to the pNFS client and immaterial, because, it is always clear to the pNFS client and
server, from upper layer protocol being used (NFSv4.1 or non- server, from upper layer protocol being used (NFSv4.1 or non-
NFSv4.1) what role the request to the common server network NFSv4.1) what role the request to the common server network
address is directed to. address is directed to.
12.9. Security Considerations for pNFS 12.9. Security Considerations for pNFS
skipping to change at page 286, line 41 skipping to change at page 289, line 23
+--------------------------------------------------------+ +--------------------------------------------------------+
As the above table implies, a server can have one or two roles. A As the above table implies, a server can have one or two roles. A
server can be both a metadata server and a data server or it can be server can be both a metadata server and a data server or it can be
both a data server and non-metadata server. In addition to returning both a data server and non-metadata server. In addition to returning
two roles in EXCHANGE_ID's results, and thus serving both roles via a two roles in EXCHANGE_ID's results, and thus serving both roles via a
common client ID, a server can serve two roles by returning a unique common client ID, a server can serve two roles by returning a unique
client ID and server owner for each role in each of two EXCHANGE_ID client ID and server owner for each role in each of two EXCHANGE_ID
results, with each result indicating each role. results, with each result indicating each role.
In the case of a server with concurrent PNFS roles that are served by
a common client ID, if the EXCHANGE_ID request from the client has
zero or a combination of the bits set in eia_flags, the server result
should set bits which represent the higher of the acceptable
combination of the server roles, with a preference to match the roles
requested by the client. Thus if a client request has
(EXCHGID4_FLAG_USE_NON_PNFS | EXCHGID4_FLAG_USE_PNFS_MDS |
EXCHGID4_FLAG_USE_PNFS_DS) flags set, and the server is both a
metadata server and a data server, serving both the roles by a common
client ID, the server SHOULD return with (EXCHGID4_FLAG_USE_PNFS_MDS
| EXCHGID4_FLAG_USE_PNFS_DS) set.
In the case of a server that has multiple concurrent PNFS roles, each
role served by a unique client ID, if the client specifies zero or a
combination of roles in the request, the server results SHOULD return
only one of the roles from the combination specified by the client
request. If the role specified by the server result does not match
the intended use by the client, the client should send the
EXCHANGE_ID specifying just the interested PNFS role.
If a pNFS metadata client gets a layout that refers it to an NFSv4.1 If a pNFS metadata client gets a layout that refers it to an NFSv4.1
data server, it needs a client ID on that data server. If it does data server, it needs a client ID on that data server. If it does
not yet have a client ID from the server that had the not yet have a client ID from the server that had the
EXCHGID4_FLAG_USE_PNFS_DS flag set in the EXCHANGE_ID results, then EXCHGID4_FLAG_USE_PNFS_DS flag set in the EXCHANGE_ID results, then
the client must send an EXCHANGE_ID to the data server, using the the client must send an EXCHANGE_ID to the data server, using the
same co_ownerid as it sent to the metadata server, with the same co_ownerid as it sent to the metadata server, with the
EXCHGID4_FLAG_USE_PNFS_DS flag set in the arguments. If the server's EXCHGID4_FLAG_USE_PNFS_DS flag set in the arguments. If the server's
EXCHANGE_ID results have EXCHGID4_FLAG_USE_PNFS_DS set, then the EXCHANGE_ID results have EXCHGID4_FLAG_USE_PNFS_DS set, then the
client may use the client ID to create sessions that will exchange client may use the client ID to create sessions that will exchange
pNFS data operations. The client ID returned by the data server has pNFS data operations. The client ID returned by the data server has
skipping to change at page 287, line 21 skipping to change at page 290, line 23
because the sessionid in the preceding SEQUENCE operation is tied to because the sessionid in the preceding SEQUENCE operation is tied to
the client ID of the data server, the data server has no obvious way the client ID of the data server, the data server has no obvious way
to determine the metadata server from the COMPOUND procedure, and to determine the metadata server from the COMPOUND procedure, and
thus has no way to validate the stateid. One RECOMMENDED approach is thus has no way to validate the stateid. One RECOMMENDED approach is
for pNFS servers to encode metadata server routing and/or identity for pNFS servers to encode metadata server routing and/or identity
information in the data server filehandles as returned in the layout. information in the data server filehandles as returned in the layout.
If metadata server routing and/or identity information is encoded in If metadata server routing and/or identity information is encoded in
data server filehandles, when the metadata server identity or data server filehandles, when the metadata server identity or
location changes, the data server filehandles it gave out must become location changes, the data server filehandles it gave out must become
become invalid (stale), and so the metadata server must first recall invalid (stale), and so the metadata server must first recall the
the layouts. Invalidating a data server filehandle does not render layouts. Invalidating a data server filehandle does not render the
the NFS client's data cache invalid. The client's cache should map a NFS client's data cache invalid. The client's cache should map a
data server filehandle to a metadata server filehandle, and a data server filehandle to a metadata server filehandle, and a
metadata server filehandle to cached data. metadata server filehandle to cached data.
If a server is both a metadata server and a data server, the server
might need to distinguish operations on files that are directed to
the metadata server from those that are directed to the data server.
It is RECOMMENDED that the values of the filehandles returned by the
LAYOUTGET operation to be different than the value of the filehandle
returned by the OPEN of the same file.
Another scenario is for the metadata server and the storage device to
be distinct from one client's point of view, and the roles reversed
from another client's point of view. For example, in the cluster
file system model a metadata server to one client, may be a data
server to another client. If NFSv4.1 is being used as the storage
protocol, then pNFS servers need to encode the values of filehandles
according to their specific roles.
13.2. File Layout Definitions 13.2. File Layout Definitions
The following definitions apply to the LAYOUT4_NFSV4_1_FILES layout The following definitions apply to the LAYOUT4_NFSV4_1_FILES layout
type, and may be applicable to other layout types. type, and may be applicable to other layout types.
Unit. A unit is a fixed size quantity of data written to a data Unit. A unit is a fixed size quantity of data written to a data
server. server.
Pattern. A pattern is a method of distributing one or more equal Pattern. A pattern is a method of distributing one or more equal
sized units across a set of data servers. A pattern is iterated sized units across a set of data servers. A pattern is iterated
skipping to change at page 291, line 5 skipping to change at page 294, line 28
converting the client's logical I/O offset (e.g. the current converting the client's logical I/O offset (e.g. the current
offset in a POSIX file descriptor before the read() or write() offset in a POSIX file descriptor before the read() or write()
system call is sent) into the stripe unit number (see system call is sent) into the stripe unit number (see
Section 13.4.1). Section 13.4.1).
If dense packing is used, then nfl_pattern_offset is also needed If dense packing is used, then nfl_pattern_offset is also needed
to convert the client's logical I/O offset to an offset on the to convert the client's logical I/O offset to an offset on the
file on the data server corresponding to the stripe unit number file on the data server corresponding to the stripe unit number
(see Section 13.4.4). (see Section 13.4.4).
Note that nfl_pattern_offset is not always the same as lo_offset.
For example, via the LAYOUTGET operation, a client might request
a layout starting at offset 1000 of a file that has its striping
pattern start at offset 0.
5. nfl_fh_list: An array of data server filehandles for each list of 5. nfl_fh_list: An array of data server filehandles for each list of
data servers in each element of the nflda_multipath_ds_list data servers in each element of the nflda_multipath_ds_list
array. The number of elements in nfl_fh_list depends on whether array. The number of elements in nfl_fh_list depends on whether
sparse or dense packing is being used. sparse or dense packing is being used.
* If sparse packing is being used, the number of elements in * If sparse packing is being used, the number of elements in
nfl_fh_list MUST be one of three values: nfl_fh_list MUST be one of three values:
+ Zero. This means that filehandles used for each data + Zero. This means that filehandles used for each data
server are the same as the filehandle returned by the OPEN server are the same as the filehandle returned by the OPEN
skipping to change at page 298, line 5 skipping to change at page 301, line 29
2, 5, 8, 11, ... filled. The unfilled stripe units of each file will 2, 5, 8, 11, ... filled. The unfilled stripe units of each file will
be holes, hence the files in each data server are sparse. be holes, hence the files in each data server are sparse.
If sparse packing is being used and a client attempts I/O to one of If sparse packing is being used and a client attempts I/O to one of
the holes, then an error MUST be returned by the data server. Using the holes, then an error MUST be returned by the data server. Using
the above example, if data server 3 received a READ or WRITE request the above example, if data server 3 received a READ or WRITE request
for block 4, the data server would return NFS4ERR_PNFS_IO_HOLE. Thus for block 4, the data server would return NFS4ERR_PNFS_IO_HOLE. Thus
data servers need to understand the striping pattern in order to data servers need to understand the striping pattern in order to
support sparse packing. support sparse packing.
If nfl_util & NFL4_UFLG_DENSE is one, this means that that dense If nfl_util & NFL4_UFLG_DENSE is one, this means that dense packing
packing is being used and the data server files have no holes. Dense is being used and the data server files have no holes. Dense packing
packing might be selected because the data server does not might be selected because the data server does not (efficiently)
(efficiently) support holey files, or because the data server cannot support holey files, or because the data server cannot recognize
recognize read-ahead unless there are no holes. If dense packing is read-ahead unless there are no holes. If dense packing is indicated
indicated in the layout, the data files must be packed. Using the in the layout, the data files must be packed. Using the example
example striping pattern and stripe unit size that was used for the striping pattern and stripe unit size that was used for the sparse
sparse packing example, the corresponding dense packing would have packing example, the corresponding dense packing would have all
all stripe units of all data files filled. Logical stripe units 0, stripe units of all data files filled. Logical stripe units 0, 3, 6,
3, 6, ... of the file would live on stripe units 0, 1, 2, ... of the ... of the file would live on stripe units 0, 1, 2, ... of the file
file of data server 1, logical stripe units 1, 4, 7, ... of the file of data server 1, logical stripe units 1, 4, 7, ... of the file would
would live on stripe units 0, 1, 2, ... of the file of data server 2, live on stripe units 0, 1, 2, ... of the file of data server 2, and
and logical stripe units 2, 5, 8, ... of the file would live on logical stripe units 2, 5, 8, ... of the file would live on stripe
stripe units 0, 1, 2, ... of the file of data server 3. units 0, 1, 2, ... of the file of data server 3.
Because dense packing does not leave holes on the data servers, the Because dense packing does not leave holes on the data servers, the
pNFS client is allowed to write to any offset of any data file of any pNFS client is allowed to write to any offset of any data file of any
data server in the stripe. Thus the the data servers need not know data server in the stripe. Thus the data servers need not know the
the file's striping pattern. file's striping pattern.
The calculation to determine the byte offset within the data file for The calculation to determine the byte offset within the data file for
dense data server layouts is: dense data server layouts is:
stripe_width = stripe_unit_size * N; stripe_width = stripe_unit_size * N;
where N = number of elements in nflda_stripe_indices. where N = number of elements in nflda_stripe_indices.
relative_offset = file_offset - nfl_pattern_offset; relative_offset = file_offset - nfl_pattern_offset;
data_file_offset = floor(relative_offset / stripe_width) data_file_offset = floor(relative_offset / stripe_width)
skipping to change at page 304, line 20 skipping to change at page 307, line 49
has the implication that stateids are globally valid on both the has the implication that stateids are globally valid on both the
metadata and data servers. This requires the metadata server to metadata and data servers. This requires the metadata server to
propagate changes in lock and open state to the data servers, so that propagate changes in lock and open state to the data servers, so that
the data servers can validate I/O accesses. This is discussed the data servers can validate I/O accesses. This is discussed
further in Section 13.9.2. Depending on when stateids are further in Section 13.9.2. Depending on when stateids are
propagated, the existence of a valid stateid on the data server may propagated, the existence of a valid stateid on the data server may
act as proof of a valid layout. act as proof of a valid layout.
Clients performing I/O operations need to select an appropriate Clients performing I/O operations need to select an appropriate
stateid based on the locks (including opens and delegations) held by stateid based on the locks (including opens and delegations) held by
the client and the various types of lock owners issuing the I/O the client and the various types of state-owners issuing the I/O
requests. The rules for doing so when referencing data servers are requests. The rules for doing so when referencing data servers are
somewhat different from those discussed in Section 8.2.5 which apply somewhat different from those discussed in Section 8.2.5 which apply
when accessing metadata servers. when accessing metadata servers.
The following rules, applied in order of decreasing priority, govern The following rules, applied in order of decreasing priority, govern
the selection of the appropriate stateid: the selection of the appropriate stateid:
o If the client holds a delegation for the file in question, the o If the client holds a delegation for the file in question, the
delegation stateid should be used. delegation stateid should be used.
o Otherwise, there must be an open stateid for the current o Otherwise, there must be an open stateid for the current open-
openowner, and that open stateid for the open file in question is owner, and that open stateid for the open file in question is
used, unless mandatory locking, prevents that. See below. used, unless mandatory locking, prevents that. See below.
o If the data server had previously responded with NFS4ERR_LOCKED to o If the data server had previously responded with NFS4ERR_LOCKED to
use of the open stateid, then the client should use the lock use of the open stateid, then the client should use the lock
stateid whenever one exists for that open file with the current stateid whenever one exists for that open file with the current
lockowner. lock-owner.
o Special stateids should never be used and if used the data server o Special stateids should never be used and if used the data server
MUST reject the I/O with a NFS4ERR_BAD_STATEID error. MUST reject the I/O with a NFS4ERR_BAD_STATEID error.
13.9.2. Data Server State Propagation 13.9.2. Data Server State Propagation
Since the metadata server, which handles lock and open-mode state Since the metadata server, which handles lock and open-mode state
changes, as well as ACLs, may not be co-located with the data servers changes, as well as ACLs, may not be co-located with the data servers
where I/O access are validated, the server implementation MUST take where I/O access are validated, the server implementation MUST take
care of propagating changes of this state to the data servers. Once care of propagating changes of this state to the data servers. Once
skipping to change at page 316, line 9 skipping to change at page 319, line 43
| NFS4ERR_BADSESSION | 10052 | Section 15.1.11.1 | | NFS4ERR_BADSESSION | 10052 | Section 15.1.11.1 |
| NFS4ERR_BADSLOT | 10053 | Section 15.1.11.2 | | NFS4ERR_BADSLOT | 10053 | Section 15.1.11.2 |
| NFS4ERR_BADTYPE | 10007 | Section 15.1.4.1 | | NFS4ERR_BADTYPE | 10007 | Section 15.1.4.1 |
| NFS4ERR_BADXDR | 10036 | Section 15.1.1.1 | | NFS4ERR_BADXDR | 10036 | Section 15.1.1.1 |
| NFS4ERR_BAD_COOKIE | 10003 | Section 15.1.1.2 | | NFS4ERR_BAD_COOKIE | 10003 | Section 15.1.1.2 |
| NFS4ERR_BAD_HIGH_SLOT | 10077 | Section 15.1.11.3 | | NFS4ERR_BAD_HIGH_SLOT | 10077 | Section 15.1.11.3 |
| NFS4ERR_BAD_RANGE | 10042 | Section 15.1.8.1 | | NFS4ERR_BAD_RANGE | 10042 | Section 15.1.8.1 |
| NFS4ERR_BAD_SEQID | 10026 | Section 15.1.16.1 | | NFS4ERR_BAD_SEQID | 10026 | Section 15.1.16.1 |
| NFS4ERR_BAD_SESSION_DIGEST | 10051 | Section 15.1.12.2 | | NFS4ERR_BAD_SESSION_DIGEST | 10051 | Section 15.1.12.2 |
| NFS4ERR_BAD_STATEID | 10025 | Section 15.1.5.2 | | NFS4ERR_BAD_STATEID | 10025 | Section 15.1.5.2 |
| NFS4ERR_CB_PATH_DOWN | 10048 | Section 15.1.16.2 | | NFS4ERR_CB_PATH_DOWN | 10048 | Section 15.1.11.4 |
| NFS4ERR_CLID_INUSE | 10017 | Section 15.1.13.2 | | NFS4ERR_CLID_INUSE | 10017 | Section 15.1.13.2 |
| NFS4ERR_CLIENTID_BUSY | 10074 | Section 15.1.13.1 | | NFS4ERR_CLIENTID_BUSY | 10074 | Section 15.1.13.1 |
| NFS4ERR_COMPLETE_ALREADY | 10054 | Section 15.1.9.1 | | NFS4ERR_COMPLETE_ALREADY | 10054 | Section 15.1.9.1 |
| NFS4ERR_CONN_BINDING_NOT_ENFORCED | 10073 | Section 15.1.12.3 | | NFS4ERR_CONN_BINDING_NOT_ENFORCED | 10073 | Section 15.1.12.3 |
| NFS4ERR_CONN_NOT_BOUND_TO_SESSION | 10055 | Section 15.1.11.5 | | NFS4ERR_CONN_NOT_BOUND_TO_SESSION | 10055 | Section 15.1.11.6 |
| NFS4ERR_DEADLOCK | 10045 | Section 15.1.8.2 | | NFS4ERR_DEADLOCK | 10045 | Section 15.1.8.2 |
| NFS4ERR_DEADSESSION | 10078 | Section 15.1.11.4 | | NFS4ERR_DEADSESSION | 10078 | Section 15.1.11.5 |
| NFS4ERR_DELAY | 10008 | Section 15.1.1.3 | | NFS4ERR_DELAY | 10008 | Section 15.1.1.3 |
| NFS4ERR_DELEG_ALREADY_WANTED | 10056 | Section 15.1.14.1 | | NFS4ERR_DELEG_ALREADY_WANTED | 10056 | Section 15.1.14.1 |
| NFS4ERR_DENIED | 10010 | Section 15.1.8.3 | | NFS4ERR_DENIED | 10010 | Section 15.1.8.3 |
| NFS4ERR_DIRDELEG_UNAVAIL | 10084 | Section 15.1.14.2 | | NFS4ERR_DIRDELEG_UNAVAIL | 10084 | Section 15.1.14.2 |
| NFS4ERR_DQUOT | 69 | Section 15.1.4.2 | | NFS4ERR_DQUOT | 69 | Section 15.1.4.2 |
| NFS4ERR_ENCR_ALG_UNSUPP | 10079 | Section 15.1.13.3 | | NFS4ERR_ENCR_ALG_UNSUPP | 10079 | Section 15.1.13.3 |
| NFS4ERR_EXIST | 17 | Section 15.1.4.3 | | NFS4ERR_EXIST | 17 | Section 15.1.4.3 |
| NFS4ERR_EXPIRED | 10011 | Section 15.1.5.4 | | NFS4ERR_EXPIRED | 10011 | Section 15.1.5.4 |
| NFS4ERR_FBIG | 27 | Section 15.1.4.4 | | NFS4ERR_FBIG | 27 | Section 15.1.4.4 |
| NFS4ERR_FHEXPIRED | 10014 | Section 15.1.2.2 | | NFS4ERR_FHEXPIRED | 10014 | Section 15.1.2.2 |
| NFS4ERR_FILE_OPEN | 10046 | Section 15.1.4.5 | | NFS4ERR_FILE_OPEN | 10046 | Section 15.1.4.5 |
| NFS4ERR_GRACE | 10013 | Section 15.1.9.2 | | NFS4ERR_GRACE | 10013 | Section 15.1.9.2 |
| NFS4ERR_HASH_ALG_UNSUPP | 10072 | Section 15.1.13.4 | | NFS4ERR_HASH_ALG_UNSUPP | 10072 | Section 15.1.13.4 |
| NFS4ERR_INVAL | 22 | Section 15.1.1.4 | | NFS4ERR_INVAL | 22 | Section 15.1.1.4 |
| NFS4ERR_IO | 5 | Section 15.1.4.6 | | NFS4ERR_IO | 5 | Section 15.1.4.6 |
| NFS4ERR_ISDIR | 21 | Section 15.1.2.3 | | NFS4ERR_ISDIR | 21 | Section 15.1.2.3 |
| NFS4ERR_LAYOUTTRYLATER | 10058 | Section 15.1.10.3 | | NFS4ERR_LAYOUTTRYLATER | 10058 | Section 15.1.10.3 |
| NFS4ERR_LAYOUTUNAVAILABLE | 10059 | Section 15.1.10.4 | | NFS4ERR_LAYOUTUNAVAILABLE | 10059 | Section 15.1.10.4 |
| NFS4ERR_LEASE_MOVED | 10031 | Section 15.1.16.3 | | NFS4ERR_LEASE_MOVED | 10031 | Section 15.1.16.2 |
| NFS4ERR_LOCKED | 10012 | Section 15.1.8.4 | | NFS4ERR_LOCKED | 10012 | Section 15.1.8.4 |
| NFS4ERR_LOCKS_HELD | 10037 | Section 15.1.8.5 | | NFS4ERR_LOCKS_HELD | 10037 | Section 15.1.8.5 |
| NFS4ERR_LOCK_NOTSUPP | 10043 | Section 15.1.8.6 | | NFS4ERR_LOCK_NOTSUPP | 10043 | Section 15.1.8.6 |
| NFS4ERR_LOCK_RANGE | 10028 | Section 15.1.8.7 | | NFS4ERR_LOCK_RANGE | 10028 | Section 15.1.8.7 |
| NFS4ERR_MINOR_VERS_MISMATCH | 10021 | Section 15.1.3.2 | | NFS4ERR_MINOR_VERS_MISMATCH | 10021 | Section 15.1.3.2 |
| NFS4ERR_MLINK | 31 | Section 15.1.4.7 | | NFS4ERR_MLINK | 31 | Section 15.1.4.7 |
| NFS4ERR_MOVED | 10019 | Section 15.1.2.4 | | NFS4ERR_MOVED | 10019 | Section 15.1.2.4 |
| NFS4ERR_NAMETOOLONG | 63 | Section 15.1.7.3 | | NFS4ERR_NAMETOOLONG | 63 | Section 15.1.7.3 |
| NFS4ERR_NOENT | 2 | Section 15.1.4.8 | | NFS4ERR_NOENT | 2 | Section 15.1.4.8 |
| NFS4ERR_NOFILEHANDLE | 10020 | Section 15.1.2.5 | | NFS4ERR_NOFILEHANDLE | 10020 | Section 15.1.2.5 |
| NFS4ERR_NOMATCHING_LAYOUT | 10060 | Section 15.1.10.5 | | NFS4ERR_NOMATCHING_LAYOUT | 10060 | Section 15.1.10.5 |
| NFS4ERR_NOSPC | 28 | Section 15.1.4.9 | | NFS4ERR_NOSPC | 28 | Section 15.1.4.9 |
| NFS4ERR_NOTDIR | 20 | Section 15.1.2.6 | | NFS4ERR_NOTDIR | 20 | Section 15.1.2.6 |
| NFS4ERR_NOTEMPTY | 66 | Section 15.1.4.10 | | NFS4ERR_NOTEMPTY | 66 | Section 15.1.4.10 |
| NFS4ERR_NOTSUPP | 10004 | Section 15.1.1.5 | | NFS4ERR_NOTSUPP | 10004 | Section 15.1.1.5 |
| NFS4ERR_NOT_ONLY_OP | 10081 | Section 15.1.3.3 | | NFS4ERR_NOT_ONLY_OP | 10081 | Section 15.1.3.3 |
| NFS4ERR_NOT_SAME | 10027 | Section 15.1.15.3 | | NFS4ERR_NOT_SAME | 10027 | Section 15.1.15.3 |
| NFS4ERR_NO_GRACE | 10033 | Section 15.1.9.3 | | NFS4ERR_NO_GRACE | 10033 | Section 15.1.9.3 |
| NFS4ERR_NXIO | 6 | Section 15.1.16.4 | | NFS4ERR_NXIO | 6 | Section 15.1.16.3 |
| NFS4ERR_OLD_STATEID | 10024 | Section 15.1.5.5 | | NFS4ERR_OLD_STATEID | 10024 | Section 15.1.5.5 |
| NFS4ERR_OPENMODE | 10038 | Section 15.1.8.8 | | NFS4ERR_OPENMODE | 10038 | Section 15.1.8.8 |
| NFS4ERR_OP_ILLEGAL | 10044 | Section 15.1.3.4 | | NFS4ERR_OP_ILLEGAL | 10044 | Section 15.1.3.4 |
| NFS4ERR_OP_NOT_IN_SESSION | 10070 | Section 15.1.3.5 | | NFS4ERR_OP_NOT_IN_SESSION | 10071 | Section 15.1.3.5 |
| NFS4ERR_PERM | 1 | Section 15.1.6.2 | | NFS4ERR_PERM | 1 | Section 15.1.6.2 |
| NFS4ERR_PNFS_IO_HOLE | 10075 | Section 15.1.10.6 | | NFS4ERR_PNFS_IO_HOLE | 10075 | Section 15.1.10.6 |
| NFS4ERR_PNFS_NO_LAYOUT | 10080 | Section 15.1.10.7 | | NFS4ERR_PNFS_NO_LAYOUT | 10080 | Section 15.1.10.7 |
| NFS4ERR_RECALLCONFLICT | 10061 | Section 15.1.14.3 | | NFS4ERR_RECALLCONFLICT | 10061 | Section 15.1.14.3 |
| NFS4ERR_RECLAIM_BAD | 10034 | Section 15.1.9.4 | | NFS4ERR_RECLAIM_BAD | 10034 | Section 15.1.9.4 |
| NFS4ERR_RECLAIM_CONFLICT | 10035 | Section 15.1.9.5 | | NFS4ERR_RECLAIM_CONFLICT | 10035 | Section 15.1.9.5 |
| NFS4ERR_REJECT_DELEG | 10085 | Section 15.1.14.4 | | NFS4ERR_REJECT_DELEG | 10085 | Section 15.1.14.4 |
| NFS4ERR_REP_TOO_BIG | 10066 | Section 15.1.3.6 | | NFS4ERR_REP_TOO_BIG | 10066 | Section 15.1.3.6 |
| NFS4ERR_REP_TOO_BIG_TO_CACHE | 10067 | Section 15.1.3.7 | | NFS4ERR_REP_TOO_BIG_TO_CACHE | 10067 | Section 15.1.3.7 |
| NFS4ERR_REQ_TOO_BIG | 10065 | Section 15.1.3.8 | | NFS4ERR_REQ_TOO_BIG | 10065 | Section 15.1.3.8 |
| NFS4ERR_RESTOREFH | 10030 | Section 15.1.16.5 | | NFS4ERR_RESTOREFH | 10030 | Section 15.1.16.4 |
| NFS4ERR_RETRY_UNCACHED_REP | 10068 | Section 15.1.3.9 | | NFS4ERR_RETRY_UNCACHED_REP | 10068 | Section 15.1.3.9 |
| NFS4ERR_RETURNCONFLICT | 10086 | Section 15.1.10.8 | | NFS4ERR_RETURNCONFLICT | 10086 | Section 15.1.10.8 |
| NFS4ERR_ROFS | 30 | Section 15.1.4.11 | | NFS4ERR_ROFS | 30 | Section 15.1.4.11 |
| NFS4ERR_SAME | 10009 | Section 15.1.15.4 | | NFS4ERR_SAME | 10009 | Section 15.1.15.4 |
| NFS4ERR_SHARE_DENIED | 10015 | Section 15.1.8.9 | | NFS4ERR_SHARE_DENIED | 10015 | Section 15.1.8.9 |
| NFS4ERR_SEQUENCE_POS | 10064 | Section 15.1.3.10 | | NFS4ERR_SEQUENCE_POS | 10064 | Section 15.1.3.10 |
| NFS4ERR_SEQ_FALSE_RETRY | 10076 | Section 15.1.11.6 | | NFS4ERR_SEQ_FALSE_RETRY | 10076 | Section 15.1.11.7 |
| NFS4ERR_SEQ_MISORDERED | 10063 | Section 15.1.11.7 | | NFS4ERR_SEQ_MISORDERED | 10063 | Section 15.1.11.8 |
| NFS4ERR_SERVERFAULT | 10006 | Section 15.1.1.6 | | NFS4ERR_SERVERFAULT | 10006 | Section 15.1.1.6 |
| NFS4ERR_STALE | 70 | Section 15.1.2.7 | | NFS4ERR_STALE | 70 | Section 15.1.2.7 |
| NFS4ERR_STALE_CLIENTID | 10022 | Section 15.1.13.5 | | NFS4ERR_STALE_CLIENTID | 10022 | Section 15.1.13.5 |
| NFS4ERR_STALE_STATEID | 10023 | Section 15.1.16.6 | | NFS4ERR_STALE_STATEID | 10023 | Section 15.1.16.5 |
| NFS4ERR_SYMLINK | 10029 | Section 15.1.2.8 | | NFS4ERR_SYMLINK | 10029 | Section 15.1.2.8 |
| NFS4ERR_TOOSMALL | 10005 | Section 15.1.1.7 | | NFS4ERR_TOOSMALL | 10005 | Section 15.1.1.7 |
| NFS4ERR_TOO_MANY_OPS | 10070 | Section 15.1.3.11 | | NFS4ERR_TOO_MANY_OPS | 10070 | Section 15.1.3.11 |
| NFS4ERR_UNKNOWN_LAYOUTTYPE | 10062 | Section 15.1.10.9 | | NFS4ERR_UNKNOWN_LAYOUTTYPE | 10062 | Section 15.1.10.9 |
| NFS4ERR_UNSAFE_COMPOUND | 10069 | Section 15.1.3.12 | | NFS4ERR_UNSAFE_COMPOUND | 10069 | Section 15.1.3.12 |
| NFS4ERR_WRONGSEC | 10016 | Section 15.1.6.3 | | NFS4ERR_WRONGSEC | 10016 | Section 15.1.6.3 |
| NFS4ERR_WRONG_CRED | 10082 | Section 15.1.6.4 | | NFS4ERR_WRONG_CRED | 10082 | Section 15.1.6.4 |
| NFS4ERR_WRONG_TYPE | 10083 | Section 15.1.2.9 | | NFS4ERR_WRONG_TYPE | 10083 | Section 15.1.2.9 |
| NFS4ERR_XDEV | 18 | Section 15.1.4.12 | | NFS4ERR_XDEV | 18 | Section 15.1.4.12 |
+-----------------------------------+--------+-------------------+ +-----------------------------------+--------+-------------------+
skipping to change at page 321, line 38 skipping to change at page 325, line 28
15.1.3.4. NFS4ERR_OP_ILLEGAL (Error Code 10044) 15.1.3.4. NFS4ERR_OP_ILLEGAL (Error Code 10044)
The operation code is not a valid one for the current Compound The operation code is not a valid one for the current Compound
procedure. The opcode in the result stream matched with this error procedure. The opcode in the result stream matched with this error
is the ILLEGAL value, although the value that appears in the request is the ILLEGAL value, although the value that appears in the request
stream may be different. Where an illegal value appears and the stream may be different. Where an illegal value appears and the
replier pre-parses all ops for a Compound procedure before doing any replier pre-parses all ops for a Compound procedure before doing any
operation execution, an RPC-level XDR error may be returned in this operation execution, an RPC-level XDR error may be returned in this
case. case.
15.1.3.5. NFS4ERR_OP_NOT_IN_SESSION (Error Code 10070) 15.1.3.5. NFS4ERR_OP_NOT_IN_SESSION (Error Code 10071)
Most forward operations and all callback operations are only valid Most forward operations and all callback operations are only valid
within the context of a session, so that the Compound request in within the context of a session, so that the Compound request in
question must begin with a Sequence operation, If an attempt is made question must begin with a Sequence operation, If an attempt is made
to execute these operations outside the context of session, this to execute these operations outside the context of session, this
error results. error results.
15.1.3.6. NFS4ERR_REP_TOO_BIG (Error Code 10066) 15.1.3.6. NFS4ERR_REP_TOO_BIG (Error Code 10066)
The reply to a Compound would exceed the channel's negotiated maximum The reply to a Compound would exceed the channel's negotiated maximum
skipping to change at page 327, line 9 skipping to change at page 330, line 42
15.1.8.2. NFS4ERR_DEADLOCK (Error Code 10045) 15.1.8.2. NFS4ERR_DEADLOCK (Error Code 10045)
The server has been able to determine a file locking deadlock The server has been able to determine a file locking deadlock
condition for a blocking lock request. condition for a blocking lock request.
15.1.8.3. NFS4ERR_DENIED (Error Code 10010) 15.1.8.3. NFS4ERR_DENIED (Error Code 10010)
An attempt to lock a file is denied. Since this may be a temporary An attempt to lock a file is denied. Since this may be a temporary
condition, the client is encouraged to retry the lock request until condition, the client is encouraged to retry the lock request until
the lock is accepted. See Section 9.4 for a discussion of retry. the lock is accepted. See Section 9.6 for a discussion of retry.
15.1.8.4. NFS4ERR_LOCKED (Error Code 10012) 15.1.8.4. NFS4ERR_LOCKED (Error Code 10012)
A read or write operation was attempted on a file where there was a A read or write operation was attempted on a file where there was a
conflict between the I/O and an existing lock: conflict between the I/O and an existing lock:
o There is a share reservation inconsistent with the I/O being done. o There is a share reservation inconsistent with the I/O being done.
o The range to be read or written intersects an existing mandatory o The range to be read or written intersects an existing mandatory
byte range lock. byte range lock.
skipping to change at page 329, line 35 skipping to change at page 333, line 18
or the particular specified file. or the particular specified file.
15.1.10.5. NFS4ERR_NOMATCHING_LAYOUT (Error Code 10060) 15.1.10.5. NFS4ERR_NOMATCHING_LAYOUT (Error Code 10060)
Returned when layouts are recalled and the client has no layouts Returned when layouts are recalled and the client has no layouts
matching the specification of the layouts being recalled. matching the specification of the layouts being recalled.
15.1.10.6. NFS4ERR_PNFS_IO_HOLE (Error Code 10075) 15.1.10.6. NFS4ERR_PNFS_IO_HOLE (Error Code 10075)
The pNFS client has attempted to read from or write to an illegal The pNFS client has attempted to read from or write to an illegal
hole of a file of a data server that is using the STRIPE4_SPARSE hole of a file of a data server that is using sparse packing. See
stripe type. See Section 13.4.4. Section 13.4.4.
15.1.10.7. NFS4ERR_PNFS_NO_LAYOUT (Error Code 10080) 15.1.10.7. NFS4ERR_PNFS_NO_LAYOUT (Error Code 10080)
The pNFS client has attempted to read from or write to a file (using The pNFS client has attempted to read from or write to a file (using
a request to a data server) without holding a valid layout. This a request to a data server) without holding a valid layout. This
includes the case where the client had a layout, but the iomode does includes the case where the client had a layout, but the iomode does
not allow a WRITE. not allow a WRITE.
15.1.10.8. NFS4ERR_RETURNCONFLICT (Error Code 10086) 15.1.10.8. NFS4ERR_RETURNCONFLICT (Error Code 10086)
skipping to change at page 330, line 31 skipping to change at page 334, line 11
The requester sent a Sequence operation that attempted to use a slot The requester sent a Sequence operation that attempted to use a slot
the replier does not have in its slot table. It is possible the slot the replier does not have in its slot table. It is possible the slot
may have been retired. may have been retired.
15.1.11.3. NFS4ERR_BAD_HIGH_SLOT (Error Code 10077) 15.1.11.3. NFS4ERR_BAD_HIGH_SLOT (Error Code 10077)
The highest_slot argument in a Sequence operation exceeds the The highest_slot argument in a Sequence operation exceeds the
replier's enforced highest_slotid. replier's enforced highest_slotid.
15.1.11.4. NFS4ERR_DEADSESSION (Error Code 10078) 15.1.11.4. NFS4ERR_CB_PATH_DOWN (Error Code 10048)
There is a problem contacting the client via the callback path. The
function of this error has been mostly superseded by the use of
status flags in the reply to the SEQUENCE SEQUENCE operation (see
Section 18.46).
15.1.11.5. NFS4ERR_DEADSESSION (Error Code 10078)
The specified session is a persistent session which is dead and does The specified session is a persistent session which is dead and does
not accept new requests or perform new operations on existing not accept new requests or perform new operations on existing
requests (in the case in which a request was partially executed requests (in the case in which a request was partially executed
before server restart). before server restart).
15.1.11.5. NFS4ERR_CONN_NOT_BOUND_TO_SESSION (Error Code 10055) 15.1.11.6. NFS4ERR_CONN_NOT_BOUND_TO_SESSION (Error Code 10055)
A Sequence operation was sent on a connection that has not been A Sequence operation was sent on a connection that has not been
associated with the specified session, in an environment where the associated with the specified session, in an environment where the
associated client ID specified that connection binding be enforced. associated client ID specified that connection binding be enforced.
15.1.11.6. NFS4ERR_SEQ_FALSE_RETRY (Error Code 10076) 15.1.11.7. NFS4ERR_SEQ_FALSE_RETRY (Error Code 10076)
The requester sent a Sequence operation with a slot id and sequence The requester sent a Sequence operation with a slot id and sequence
id that are in the reply cache, but the replier has detected that the id that are in the reply cache, but the replier has detected that the
retried request is not the same as the original request. retried request is not the same as the original request.
15.1.11.7. NFS4ERR_SEQ_MISORDERED (Error Code 10063) 15.1.11.8. NFS4ERR_SEQ_MISORDERED (Error Code 10063)
The requester sent a Sequence operation with an invalid sequence id. The requester sent a Sequence operation with an invalid sequence id.
15.1.12. Session Management Errors 15.1.12. Session Management Errors
This section deals with errors associated with requests used in This section deals with errors associated with requests used in
session management. session management.
15.1.12.1. NFS4ERR_BACK_CHAN_BUSY (Error Code 10057) 15.1.12.1. NFS4ERR_BACK_CHAN_BUSY (Error Code 10057)
skipping to change at page 334, line 5 skipping to change at page 337, line 37
o There has been a restructuring of some errors for NFSv4.1 which o There has been a restructuring of some errors for NFSv4.1 which
resulted in the elimination of certain of the errors. resulted in the elimination of certain of the errors.
15.1.16.1. NFS4ERR_BAD_SEQID (Error Code 10026) 15.1.16.1. NFS4ERR_BAD_SEQID (Error Code 10026)
The sequence number in a locking request is neither the next expected The sequence number in a locking request is neither the next expected
number or the last number processed. These sequence id's are ignored number or the last number processed. These sequence id's are ignored
in NFSv4.1. in NFSv4.1.
15.1.16.2. NFS4ERR_CB_PATH_DOWN (Error Code 10048) 15.1.16.2. NFS4ERR_LEASE_MOVED (Error Code 10031)
There is a problem contacting the client via the callback path
15.1.16.3. NFS4ERR_LEASE_MOVED (Error Code 10031)
A lease being renewed is associated with a file system that has been A lease being renewed is associated with a file system that has been
migrated to a new server migrated to a new server
15.1.16.4. NFS4ERR_NXIO (Error Code 5) 15.1.16.3. NFS4ERR_NXIO (Error Code 5)
I/O error. No such device or address. I/O error. No such device or address.
15.1.16.5. NFS4ERR_RESTOREFH (Error Code 10030) 15.1.16.4. NFS4ERR_RESTOREFH (Error Code 10030)
The RESTOREFH operation does not have a saved filehandle (identified The RESTOREFH operation does not have a saved filehandle (identified
by SAVEFH) to operate upon. by SAVEFH) to operate upon.
15.1.16.6. NFS4ERR_STALE_STATEID (Error Code 10023) 15.1.16.5. NFS4ERR_STALE_STATEID (Error Code 10023)
A stateid generated by an earlier server instance was used. A stateid generated by an earlier server instance was used.
15.2. Operations and their valid errors 15.2. Operations and their valid errors
This section contains a table which gives the valid error returns for This section contains a table which gives the valid error returns for
each protocol operation. The error code NFS4_OK (indicating no each protocol operation. The error code NFS4_OK (indicating no
error) is not listed but should be understood to be returnable by all error) is not listed but should be understood to be returnable by all
operations with two important exceptions: operations with two important exceptions:
skipping to change at page 337, line 26 skipping to change at page 340, line 43
| | NFS4ERR_WRONG_CRED | | | NFS4ERR_WRONG_CRED |
| DESTROY_CLIENTID | NFS4ERR_BADXDR, NFS4ERR_CLIENTID_BUSY, | | DESTROY_CLIENTID | NFS4ERR_BADXDR, NFS4ERR_CLIENTID_BUSY, |
| | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, |
| | NFS4ERR_NOT_ONLY_OP, NFS4ERR_REP_TOO_BIG, | | | NFS4ERR_NOT_ONLY_OP, NFS4ERR_REP_TOO_BIG, |
| | NFS4ERR_REP_TOO_BIG_TO_CACHE, | | | NFS4ERR_REP_TOO_BIG_TO_CACHE, |
| | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, |
| | NFS4ERR_STALE_CLIENTID, | | | NFS4ERR_STALE_CLIENTID, |
| | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_CRED | | | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_CRED |
| DESTROY_SESSION | NFS4ERR_BACK_CHAN_BUSY, | | DESTROY_SESSION | NFS4ERR_BACK_CHAN_BUSY, |
| | NFS4ERR_BADSESSION, NFS4ERR_BADXDR, | | | NFS4ERR_BADSESSION, NFS4ERR_BADXDR, |
| | NFS4ERR_CB_PATH_DOWN, |
| | NFS4ERR_CONN_NOT_BOUND_TO_SESSION, | | | NFS4ERR_CONN_NOT_BOUND_TO_SESSION, |
| | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, |
| | NFS4ERR_NOT_ONLY_OP, NFS4ERR_REP_TOO_BIG, | | | NFS4ERR_NOT_ONLY_OP, NFS4ERR_REP_TOO_BIG, |
| | NFS4ERR_REP_TOO_BIG_TO_CACHE, | | | NFS4ERR_REP_TOO_BIG_TO_CACHE, |
| | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, |
| | NFS4ERR_STALE_CLIENTID, | | | NFS4ERR_STALE_CLIENTID, |
| | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_CRED | | | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_CRED |
| EXCHANGE_ID | NFS4ERR_BADCHAR, NFS4ERR_BADXDR, | | EXCHANGE_ID | NFS4ERR_BADCHAR, NFS4ERR_BADXDR, |
| | NFS4ERR_CLID_INUSE, NFS4ERR_DEADSESSION, | | | NFS4ERR_CLID_INUSE, NFS4ERR_DEADSESSION, |
| | NFS4ERR_DELAY, NFS4ERR_ENCR_ALG_UNSUPP, | | | NFS4ERR_DELAY, NFS4ERR_ENCR_ALG_UNSUPP, |
skipping to change at page 354, line 45 skipping to change at page 358, line 45
| | SEQUENCE | | | SEQUENCE |
| NFS4ERR_BAD_RANGE | LOCK, LOCKT, LOCKU | | NFS4ERR_BAD_RANGE | LOCK, LOCKT, LOCKU |
| NFS4ERR_BAD_SESSION_DIGEST | BIND_CONN_TO_SESSION, SET_SSV | | NFS4ERR_BAD_SESSION_DIGEST | BIND_CONN_TO_SESSION, SET_SSV |
| NFS4ERR_BAD_STATEID | CB_LAYOUTRECALL, CB_NOTIFY, | | NFS4ERR_BAD_STATEID | CB_LAYOUTRECALL, CB_NOTIFY, |
| | CB_NOTIFY_LOCK, CB_RECALL, | | | CB_NOTIFY_LOCK, CB_RECALL, |
| | CLOSE, DELEGRETURN, | | | CLOSE, DELEGRETURN, |
| | FREE_STATEID, LAYOUTGET, | | | FREE_STATEID, LAYOUTGET, |
| | LAYOUTRETURN, LOCK, LOCKU, | | | LAYOUTRETURN, LOCK, LOCKU, |
| | OPEN, OPEN_DOWNGRADE, READ, | | | OPEN, OPEN_DOWNGRADE, READ, |
| | SETATTR, WRITE | | | SETATTR, WRITE |
| NFS4ERR_CB_PATH_DOWN | DESTROY_SESSION |
| NFS4ERR_CLID_INUSE | EXCHANGE_ID | | NFS4ERR_CLID_INUSE | EXCHANGE_ID |
| NFS4ERR_CLIENTID_BUSY | DESTROY_CLIENTID | | NFS4ERR_CLIENTID_BUSY | DESTROY_CLIENTID |
| NFS4ERR_COMPLETE_ALREADY | RECLAIM_COMPLETE | | NFS4ERR_COMPLETE_ALREADY | RECLAIM_COMPLETE |
| NFS4ERR_CONN_BINDING_NOT_ENFORCED | BIND_CONN_TO_SESSION, SET_SSV | | NFS4ERR_CONN_BINDING_NOT_ENFORCED | BIND_CONN_TO_SESSION, SET_SSV |
| NFS4ERR_CONN_NOT_BOUND_TO_SESSION | CB_SEQUENCE, DESTROY_SESSION, | | NFS4ERR_CONN_NOT_BOUND_TO_SESSION | CB_SEQUENCE, DESTROY_SESSION, |
| | SEQUENCE | | | SEQUENCE |
| NFS4ERR_DEADLOCK | LOCK | | NFS4ERR_DEADLOCK | LOCK |
| NFS4ERR_DEADSESSION | ACCESS, BACKCHANNEL_CTL, | | NFS4ERR_DEADSESSION | ACCESS, BACKCHANNEL_CTL, |
| | BIND_CONN_TO_SESSION, CLOSE, | | | BIND_CONN_TO_SESSION, CLOSE, |
| | COMMIT, CREATE, | | | COMMIT, CREATE, |
skipping to change at page 372, line 51 skipping to change at page 376, line 51
case OP_SEQUENCE: SEQUENCE4res opsequence; case OP_SEQUENCE: SEQUENCE4res opsequence;
case OP_SET_SSV: SET_SSV4res opset_ssv; case OP_SET_SSV: SET_SSV4res opset_ssv;
case OP_TEST_STATEID: TEST_STATEID4res optest_stateid; case OP_TEST_STATEID: TEST_STATEID4res optest_stateid;
case OP_WANT_DELEGATION: case OP_WANT_DELEGATION:
WANT_DELEGATION4res WANT_DELEGATION4res
opwant_delegation; opwant_delegation;
case OP_DESTROY_CLIENTID: case OP_DESTROY_CLIENTID:
DESTROY_CLIENTID4res DESTROY_CLIENTID4res
opwant_destroy_clientid; opdestroy_clientid;
case OP_RECLAIM_COMPLETE: case OP_RECLAIM_COMPLETE:
RECLAIM_COMPLETE4res RECLAIM_COMPLETE4res
opreclaim_complete; opreclaim_complete;
/* Operations not new to NFSv4.1 */ /* Operations not new to NFSv4.1 */
case OP_ILLEGAL: ILLEGAL4res opillegal; case OP_ILLEGAL: ILLEGAL4res opillegal;
}; };
struct COMPOUND4res { struct COMPOUND4res {
nfsstat4 status; nfsstat4 status;
skipping to change at page 384, line 29 skipping to change at page 388, line 29
applicable, the GSS mechanism, combination that sent the OPEN request applicable, the GSS mechanism, combination that sent the OPEN request
also be the one to CLOSE the file. This might not be possible if also be the one to CLOSE the file. This might not be possible if
credentials for the principal are no longer available. The server credentials for the principal are no longer available. The server
MAY allow the machine credential or SSV credential (see MAY allow the machine credential or SSV credential (see
Section 18.35) to send CLOSE. Section 18.35) to send CLOSE.
18.2.4. IMPLEMENTATION 18.2.4. IMPLEMENTATION
Even though CLOSE returns a stateid, this stateid is not useful to Even though CLOSE returns a stateid, this stateid is not useful to
the client and should be treated as deprecated. CLOSE "shuts down" the client and should be treated as deprecated. CLOSE "shuts down"
the state associated with all OPENs for the file by a single the state associated with all OPENs for the file by a single open-
open_owner. As noted above, CLOSE will either release all file owner. As noted above, CLOSE will either release all file locking
locking state or return an error. Therefore, the stateid returned by state or return an error. Therefore, the stateid returned by CLOSE
CLOSE is not useful for operations that follow. To help find any is not useful for operations that follow. To help find any uses of
uses of this stateid by clients, the server SHOULD return the invalid this stateid by clients, the server SHOULD return the invalid special
special stated (the "other" value is zero and the "seqid" field is stated (the "other" value is zero and the "seqid" field is
NFS4_MAXFILELEN). NFS4_MAXFILELEN).
A CLOSE operation may make delegations grantable where they were not A CLOSE operation may make delegations grantable where they were not
previously. Servers may choose to respond immediately if there are previously. Servers may choose to respond immediately if there are
pending delegation want requests or may respond to the situation at a pending delegation want requests or may respond to the situation at a
later time. later time.
18.3. Operation 5: COMMIT - Commit Cached Data 18.3. Operation 5: COMMIT - Commit Cached Data
18.3.1. ARGUMENTS 18.3.1. ARGUMENTS
skipping to change at page 388, line 33 skipping to change at page 392, line 33
file in a directory with a given name. The OPEN operation MUST be file in a directory with a given name. The OPEN operation MUST be
used to create a regular file or a named attribute. used to create a regular file or a named attribute.
The directory must be an object of type NF4DIR. If the current The directory must be an object of type NF4DIR. If the current
filehandle is an attribute directory (type NF4ATTRDIR), the error filehandle is an attribute directory (type NF4ATTRDIR), the error
NFS4ERR_WRONG_TYPE is returned. If the current file handle designate NFS4ERR_WRONG_TYPE is returned. If the current file handle designate
any other type of object, the error NFS4ERR_NOTDIR results. any other type of object, the error NFS4ERR_NOTDIR results.
The objname specifies the name for the new object. The objtype The objname specifies the name for the new object. The objtype
determines the type of object to be created: directory, symlink, etc. determines the type of object to be created: directory, symlink, etc.
If the typename is that of an ordinary file, a named attribute, or a If the object type specified is that of an ordinary file, a named
named attribute directory, the error NFS4ERR_WRONG_TYPE results. attribute, or a named attribute directory, the error NFS4ERR_BADTYPE
results.
If an object of the same name already exists in the directory, the If an object of the same name already exists in the directory, the
server will return the error NFS4ERR_EXIST. server will return the error NFS4ERR_EXIST.
For the directory where the new file object was created, the server For the directory where the new file object was created, the server
returns change_info4 information in cinfo. With the atomic field of returns change_info4 information in cinfo. With the atomic field of
the change_info4 data type, the server will indicate if the before the change_info4 data type, the server will indicate if the before
and after change attributes were obtained atomically with respect to and after change attributes were obtained atomically with respect to
the file object creation. the file object creation.
skipping to change at page 399, line 5 skipping to change at page 403, line 5
32 bit servers are servers that support locking for byte offsets that 32 bit servers are servers that support locking for byte offsets that
fit within 32 bits (i.e. less than or equal to 0xFFFFFFFF). If the fit within 32 bits (i.e. less than or equal to 0xFFFFFFFF). If the
client specifies a range that overlaps one or more bytes beyond client specifies a range that overlaps one or more bytes beyond
offset 0xFFFFFFFF, but does not end at the maximum 64 bit offset offset 0xFFFFFFFF, but does not end at the maximum 64 bit offset
(i.e. 0xFFFFFFFFFFFFFFFF), such a 32-bit server MUST return the error (i.e. 0xFFFFFFFFFFFFFFFF), such a 32-bit server MUST return the error
NFS4ERR_BAD_RANGE. NFS4ERR_BAD_RANGE.
If the server returns NFS4ERR_DENIED, owner, offset, and length of a If the server returns NFS4ERR_DENIED, owner, offset, and length of a
conflicting lock are returned. conflicting lock are returned.
The locker argument specifies the lock_owner that is associated with The locker argument specifies the lock-owner that is associated with
the LOCK request. The locker4 structure is a switched union that the LOCK request. The locker4 structure is a switched union that
indicates whether the client has already created record locking state indicates whether the client has already created record locking state
associated with the current open file and lock owner. In the case in associated with the current open file and lock-owner. In the case in
which it has, the argument is just a stateid for the set of locks which it has, the argument is just a stateid for the set of locks
associated with that open file and lock owner, together with a associated with that open file and lock-owner, together with a
lock_seqid value which MAY be any value and MUST be ignored by the lock_seqid value which MAY be any value and MUST be ignored by the
server. In the case where no such state has been established, or the server. In the case where no such state has been established, or the
client does not have the stateid available, the argument contains the client does not have the stateid available, the argument contains the
stateid of the open file with which this lock is to be associated, stateid of the open file with which this lock is to be associated,
together with the lock_owner which which the lock is to be together with the lock-owner with which the lock is to be associated.