draft-ietf-nfsv4-rfc5661sesqui-msns-03.txt   draft-ietf-nfsv4-rfc5661sesqui-msns-04.txt 
NFSv4 D. Noveck, Ed. NFSv4 D. Noveck, Ed.
Internet-Draft NetApp Internet-Draft NetApp
Obsoletes: 5661 (if approved) C. Lever Obsoletes: 5661 (if approved) C. Lever
Intended status: Standards Track ORACLE Intended status: Standards Track ORACLE
Expires: April 22, 2020 October 20, 2019 Expires: July 31, 2020 January 28, 2020
Network File System (NFS) Version 4 Minor Version 1 Protocol Network File System (NFS) Version 4 Minor Version 1 Protocol
draft-ietf-nfsv4-rfc5661sesqui-msns-03 draft-ietf-nfsv4-rfc5661sesqui-msns-04
Abstract Abstract
This document describes the Network File System (NFS) version 4 minor This document describes the Network File System (NFS) version 4 minor
version 1, including features retained from the base protocol (NFS version 1, including features retained from the base protocol (NFS
version 4 minor version 0, which is specified in RFC 7530) and version 4 minor version 0, which is specified in RFC 7530) and
protocol extensions made subsequently. The later minor version has protocol extensions made subsequently. The later minor version has
no dependencies on NFS version 4 minor version 0, and is considered a no dependencies on NFS version 4 minor version 0, and is considered a
separate protocol. separate protocol.
This document obsoletes RFC5661. It substantialy revises the This document obsoletes RFC5661. It substantially revises the
treatment of features relating to multi-server namesapce superseding treatment of features relating to multi-server namespace, superseding
the description of those features appearing in RFC5661. the description of those features appearing in RFC5661.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 22, 2020. This Internet-Draft will expire on July 31, 2020.
Copyright Notice Copyright Notice
Copyright (c) 2019 IETF Trust and the persons identified as the Copyright (c) 2020 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
skipping to change at page 2, line 28 skipping to change at page 2, line 28
outside the IETF Standards Process, and derivative works of it may outside the IETF Standards Process, and derivative works of it may
not be created outside the IETF Standards Process, except to format not be created outside the IETF Standards Process, except to format
it for publication as an RFC or to translate it into languages other it for publication as an RFC or to translate it into languages other
than English. than English.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 7 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1. Introduction to this Update . . . . . . . . . . . . . . . 7 1.1. Introduction to this Update . . . . . . . . . . . . . . . 7
1.2. The NFS Version 4 Minor Version 1 Protocol . . . . . . . 9 1.2. The NFS Version 4 Minor Version 1 Protocol . . . . . . . 9
1.3. Requirements Language . . . . . . . . . . . . . . . . . . 9 1.3. Requirements Language . . . . . . . . . . . . . . . . . . 10
1.4. Scope of This Document . . . . . . . . . . . . . . . . . 9 1.4. Scope of This Document . . . . . . . . . . . . . . . . . 10
1.5. NFSv4 Goals . . . . . . . . . . . . . . . . . . . . . . . 10 1.5. NFSv4 Goals . . . . . . . . . . . . . . . . . . . . . . . 10
1.6. NFSv4.1 Goals . . . . . . . . . . . . . . . . . . . . . . 10 1.6. NFSv4.1 Goals . . . . . . . . . . . . . . . . . . . . . . 11
1.7. General Definitions . . . . . . . . . . . . . . . . . . . 11 1.7. General Definitions . . . . . . . . . . . . . . . . . . . 11
1.8. Overview of NFSv4.1 Features . . . . . . . . . . . . . . 13 1.8. Overview of NFSv4.1 Features . . . . . . . . . . . . . . 14
1.9. Differences from NFSv4.0 . . . . . . . . . . . . . . . . 17 1.9. Differences from NFSv4.0 . . . . . . . . . . . . . . . . 18
2. Core Infrastructure . . . . . . . . . . . . . . . . . . . . . 19 2. Core Infrastructure . . . . . . . . . . . . . . . . . . . . . 19
2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 19 2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 19
2.2. RPC and XDR . . . . . . . . . . . . . . . . . . . . . . . 19 2.2. RPC and XDR . . . . . . . . . . . . . . . . . . . . . . . 19
2.3. COMPOUND and CB_COMPOUND . . . . . . . . . . . . . . . . 22 2.3. COMPOUND and CB_COMPOUND . . . . . . . . . . . . . . . . 22
2.4. Client Identifiers and Client Owners . . . . . . . . . . 23 2.4. Client Identifiers and Client Owners . . . . . . . . . . 23
2.5. Server Owners . . . . . . . . . . . . . . . . . . . . . . 28 2.5. Server Owners . . . . . . . . . . . . . . . . . . . . . . 29
2.6. Security Service Negotiation . . . . . . . . . . . . . . 29 2.6. Security Service Negotiation . . . . . . . . . . . . . . 29
2.7. Minor Versioning . . . . . . . . . . . . . . . . . . . . 34 2.7. Minor Versioning . . . . . . . . . . . . . . . . . . . . 35
2.8. Non-RPC-Based Security Services . . . . . . . . . . . . . 37 2.8. Non-RPC-Based Security Services . . . . . . . . . . . . . 37
2.9. Transport Layers . . . . . . . . . . . . . . . . . . . . 38 2.9. Transport Layers . . . . . . . . . . . . . . . . . . . . 38
2.10. Session . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.10. Session . . . . . . . . . . . . . . . . . . . . . . . . . 40
3. Protocol Constants and Data Types . . . . . . . . . . . . . . 86 3. Protocol Constants and Data Types . . . . . . . . . . . . . . 87
3.1. Basic Constants . . . . . . . . . . . . . . . . . . . . . 87 3.1. Basic Constants . . . . . . . . . . . . . . . . . . . . . 87
3.2. Basic Data Types . . . . . . . . . . . . . . . . . . . . 87 3.2. Basic Data Types . . . . . . . . . . . . . . . . . . . . 88
3.3. Structured Data Types . . . . . . . . . . . . . . . . . . 89 3.3. Structured Data Types . . . . . . . . . . . . . . . . . . 90
4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . . . 98 4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 98 4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 99
4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 99 4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 100
4.3. One Method of Constructing a Volatile Filehandle . . . . 102 4.3. One Method of Constructing a Volatile Filehandle . . . . 102
4.4. Client Recovery from Filehandle Expiration . . . . . . . 102 4.4. Client Recovery from Filehandle Expiration . . . . . . . 103
5. File Attributes . . . . . . . . . . . . . . . . . . . . . . . 103 5. File Attributes . . . . . . . . . . . . . . . . . . . . . . . 104
5.1. REQUIRED Attributes . . . . . . . . . . . . . . . . . . . 104 5.1. REQUIRED Attributes . . . . . . . . . . . . . . . . . . . 105
5.2. RECOMMENDED Attributes . . . . . . . . . . . . . . . . . 104 5.2. RECOMMENDED Attributes . . . . . . . . . . . . . . . . . 105
5.3. Named Attributes . . . . . . . . . . . . . . . . . . . . 105 5.3. Named Attributes . . . . . . . . . . . . . . . . . . . . 106
5.4. Classification of Attributes . . . . . . . . . . . . . . 106 5.4. Classification of Attributes . . . . . . . . . . . . . . 107
5.5. Set-Only and Get-Only Attributes . . . . . . . . . . . . 107 5.5. Set-Only and Get-Only Attributes . . . . . . . . . . . . 108
5.6. REQUIRED Attributes - List and Definition References . . 107 5.6. REQUIRED Attributes - List and Definition References . . 108
5.7. RECOMMENDED Attributes - List and Definition References . 108 5.7. RECOMMENDED Attributes - List and Definition References . 109
5.8. Attribute Definitions . . . . . . . . . . . . . . 110 5.8. Attribute Definitions . . . . . . . . . . . . . . 111
5.9. Interpreting owner and owner_group . . . . . . . . . . . 119 5.9. Interpreting owner and owner_group . . . . . . . . . . . 120
5.10. Character Case Attributes . . . . . . . . . . . . . . . . 121 5.10. Character Case Attributes . . . . . . . . . . . . . . . . 122
5.11. Directory Notification Attributes . . . . . . . . . . . . 121 5.11. Directory Notification Attributes . . . . . . . . . . . . 122
5.12. pNFS Attribute Definitions . . . . . . . . . . . . . . . 122 5.12. pNFS Attribute Definitions . . . . . . . . . . . . . . . 123
5.13. Retention Attributes . . . . . . . . . . . . . . . . . . 123 5.13. Retention Attributes . . . . . . . . . . . . . . . . . . 124
6. Access Control Attributes . . . . . . . . . . . . . . . . . . 126 6. Access Control Attributes . . . . . . . . . . . . . . . . . . 127
6.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 126 6.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.2. File Attributes Discussion . . . . . . . . . . . . . . . 127 6.2. File Attributes Discussion . . . . . . . . . . . . . . . 128
6.3. Common Methods . . . . . . . . . . . . . . . . . . . . . 144 6.3. Common Methods . . . . . . . . . . . . . . . . . . . . . 145
6.4. Requirements . . . . . . . . . . . . . . . . . . . . . . 146 6.4. Requirements . . . . . . . . . . . . . . . . . . . . . . 147
7. Single-Server Namespace . . . . . . . . . . . . . . . . . . . 153 7. Single-Server Namespace . . . . . . . . . . . . . . . . . . . 154
7.1. Server Exports . . . . . . . . . . . . . . . . . . . . . 153 7.1. Server Exports . . . . . . . . . . . . . . . . . . . . . 154
7.2. Browsing Exports . . . . . . . . . . . . . . . . . . . . 153 7.2. Browsing Exports . . . . . . . . . . . . . . . . . . . . 154
7.3. Server Pseudo File System . . . . . . . . . . . . . . . . 154 7.3. Server Pseudo File System . . . . . . . . . . . . . . . . 155
7.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . 154 7.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . 155
7.5. Filehandle Volatility . . . . . . . . . . . . . . . . . . 155 7.5. Filehandle Volatility . . . . . . . . . . . . . . . . . . 156
7.6. Exported Root . . . . . . . . . . . . . . . . . . . . . . 155 7.6. Exported Root . . . . . . . . . . . . . . . . . . . . . . 156
7.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . 155 7.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . 156
7.8. Security Policy and Namespace Presentation . . . . . . . 156 7.8. Security Policy and Namespace Presentation . . . . . . . 157
8. State Management . . . . . . . . . . . . . . . . . . . . . . 157 8. State Management . . . . . . . . . . . . . . . . . . . . . . 158
8.1. Client and Session ID . . . . . . . . . . . . . . . . . . 158 8.1. Client and Session ID . . . . . . . . . . . . . . . . . . 159
8.2. Stateid Definition . . . . . . . . . . . . . . . . . . . 158 8.2. Stateid Definition . . . . . . . . . . . . . . . . . . . 159
8.3. Lease Renewal . . . . . . . . . . . . . . . . . . . . . . 167 8.3. Lease Renewal . . . . . . . . . . . . . . . . . . . . . . 168
8.4. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 169 8.4. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 170
8.5. Server Revocation of Locks . . . . . . . . . . . . . . . 180 8.5. Server Revocation of Locks . . . . . . . . . . . . . . . 181
8.6. Short and Long Leases . . . . . . . . . . . . . . . . . . 181 8.6. Short and Long Leases . . . . . . . . . . . . . . . . . . 182
8.7. Clocks, Propagation Delay, and Calculating Lease 8.7. Clocks, Propagation Delay, and Calculating Lease
Expiration . . . . . . . . . . . . . . . . . . . . . . . 182 Expiration . . . . . . . . . . . . . . . . . . . . . . . 183
8.8. Obsolete Locking Infrastructure from NFSv4.0 . . . . . . 182 8.8. Obsolete Locking Infrastructure from NFSv4.0 . . . . . . 183
9. File Locking and Share Reservations . . . . . . . . . . . . . 183 9. File Locking and Share Reservations . . . . . . . . . . . . . 184
9.1. Opens and Byte-Range Locks . . . . . . . . . . . . . . . 183 9.1. Opens and Byte-Range Locks . . . . . . . . . . . . . . . 184
9.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . . 187 9.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . . 188
9.3. Upgrading and Downgrading Locks . . . . . . . . . . . . . 188 9.3. Upgrading and Downgrading Locks . . . . . . . . . . . . . 189
9.4. Stateid Seqid Values and Byte-Range Locks . . . . . . . . 188 9.4. Stateid Seqid Values and Byte-Range Locks . . . . . . . . 189
9.5. Issues with Multiple Open-Owners . . . . . . . . . . . . 188 9.5. Issues with Multiple Open-Owners . . . . . . . . . . . . 189
9.6. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 189 9.6. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 190
9.7. Share Reservations . . . . . . . . . . . . . . . . . . . 190 9.7. Share Reservations . . . . . . . . . . . . . . . . . . . 191
9.8. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . . 191 9.8. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . . 192
9.9. Open Upgrade and Downgrade . . . . . . . . . . . . . . . 192 9.9. Open Upgrade and Downgrade . . . . . . . . . . . . . . . 193
9.10. Parallel OPENs . . . . . . . . . . . . . . . . . . . . . 193 9.10. Parallel OPENs . . . . . . . . . . . . . . . . . . . . . 194
9.11. Reclaim of Open and Byte-Range Locks . . . . . . . . . . 193 9.11. Reclaim of Open and Byte-Range Locks . . . . . . . . . . 194
10. Client-Side Caching . . . . . . . . . . . . . . . . . . . . . 194 10. Client-Side Caching . . . . . . . . . . . . . . . . . . . . . 195
10.1. Performance Challenges for Client-Side Caching . . . . . 194 10.1. Performance Challenges for Client-Side Caching . . . . . 195
10.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 195 10.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 196
10.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 200 10.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 201
10.4. Open Delegation . . . . . . . . . . . . . . . . . . . . 204 10.4. Open Delegation . . . . . . . . . . . . . . . . . . . . 205
10.5. Data Caching and Revocation . . . . . . . . . . . . . . 215 10.5. Data Caching and Revocation . . . . . . . . . . . . . . 216
10.6. Attribute Caching . . . . . . . . . . . . . . . . . . . 217 10.6. Attribute Caching . . . . . . . . . . . . . . . . . . . 218
10.7. Data and Metadata Caching and Memory Mapped Files . . . 219 10.7. Data and Metadata Caching and Memory Mapped Files . . . 220
10.8. Name and Directory Caching without Directory Delegations 221 10.8. Name and Directory Caching without Directory Delegations 222
10.9. Directory Delegations . . . . . . . . . . . . . . . . . 223 10.9. Directory Delegations . . . . . . . . . . . . . . . . . 224
11. Multi-Server Namespace . . . . . . . . . . . . . . . . . . . 227 11. Multi-Server Namespace . . . . . . . . . . . . . . . . . . . 228
11.1. Terminology . . . . . . . . . . . . . . . . . . . . . . 227 11.1. Terminology . . . . . . . . . . . . . . . . . . . . . . 228
11.2. File System Location Attributes . . . . . . . . . . . . 230 11.2. File System Location Attributes . . . . . . . . . . . . 232
11.3. File System Presence or Absence . . . . . . . . . . . . 231 11.3. File System Presence or Absence . . . . . . . . . . . . 233
11.4. Getting Attributes for an Absent File System . . . . . . 232 11.4. Getting Attributes for an Absent File System . . . . . . 234
11.5. Uses of File System Location Information . . . . . . . . 234 11.5. Uses of File System Location Information . . . . . . . . 236
11.6. Users and Groups in a Multi-server Namespace . . . . . . 242 11.6. Trunking without File System Location Information . . . 246
11.7. Additional Client-Side Considerations . . . . . . . . . 243 11.7. Users and Groups in a Multi-server Namespace . . . . . . 246
11.8. Overview of File Access Transitions . . . . . . . . . . 244 11.8. Additional Client-Side Considerations . . . . . . . . . 248
11.9. Effecting Network Endpoint Transitions . . . . . . . . . 244 11.9. Overview of File Access Transitions . . . . . . . . . . 248
11.10. Effecting File System Transitions . . . . . . . . . . . 245 11.10. Effecting Network Endpoint Transitions . . . . . . . . . 249
11.11. Transferring State upon Migration . . . . . . . . . . . 253 11.11. Effecting File System Transitions . . . . . . . . . . . 250
11.12. Client Responsibilities when Access is Transitioned . . 255 11.12. Transferring State upon Migration . . . . . . . . . . . 260
11.13. Server Responsibilities Upon Migration . . . . . . . . . 264 11.13. Client Responsibilities when Access is Transitioned . . 261
11.14. Effecting File System Referrals . . . . . . . . . . . . 270 11.14. Server Responsibilities Upon Migration . . . . . . . . . 271
11.15. The Attribute fs_locations . . . . . . . . . . . . . . . 277 11.15. Effecting File System Referrals . . . . . . . . . . . . 277
11.16. The Attribute fs_locations_info . . . . . . . . . . . . 280 11.16. The Attribute fs_locations . . . . . . . . . . . . . . . 284
11.17. The Attribute fs_status . . . . . . . . . . . . . . . . 294 11.17. The Attribute fs_locations_info . . . . . . . . . . . . 287
12. Parallel NFS (pNFS) . . . . . . . . . . . . . . . . . . . . . 297 11.18. The Attribute fs_status . . . . . . . . . . . . . . . . 300
12.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 297 12. Parallel NFS (pNFS) . . . . . . . . . . . . . . . . . . . . . 304
12.2. pNFS Definitions . . . . . . . . . . . . . . . . . . . . 299 12.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 304
12.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 304 12.2. pNFS Definitions . . . . . . . . . . . . . . . . . . . . 305
12.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 305 12.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 311
12.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 305 12.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 312
12.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 320 12.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 312
12.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 322 12.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 327
12.8. Metadata and Storage Device Roles . . . . . . . . . . . 327 12.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 329
12.9. Security Considerations for pNFS . . . . . . . . . . . . 327 12.8. Metadata and Storage Device Roles . . . . . . . . . . . 334
13. NFSv4.1 as a Storage Protocol in pNFS: the File Layout Type . 329 12.9. Security Considerations for pNFS . . . . . . . . . . . . 334
13.1. Client ID and Session Considerations . . . . . . . . . . 329 13. NFSv4.1 as a Storage Protocol in pNFS: the File Layout Type . 336
13.2. File Layout Definitions . . . . . . . . . . . . . . . . 332 13.1. Client ID and Session Considerations . . . . . . . . . . 336
13.3. File Layout Data Types . . . . . . . . . . . . . . . . . 332 13.2. File Layout Definitions . . . . . . . . . . . . . . . . 339
13.4. Interpreting the File Layout . . . . . . . . . . . . . . 336 13.3. File Layout Data Types . . . . . . . . . . . . . . . . . 339
13.5. Data Server Multipathing . . . . . . . . . . . . . . . . 344 13.4. Interpreting the File Layout . . . . . . . . . . . . . . 343
13.6. Operations Sent to NFSv4.1 Data Servers . . . . . . . . 345 13.5. Data Server Multipathing . . . . . . . . . . . . . . . . 351
13.7. COMMIT through Metadata Server . . . . . . . . . . . . . 347 13.6. Operations Sent to NFSv4.1 Data Servers . . . . . . . . 352
13.8. The Layout Iomode . . . . . . . . . . . . . . . . . . . 348 13.7. COMMIT through Metadata Server . . . . . . . . . . . . . 354
13.9. Metadata and Data Server State Coordination . . . . . . 349 13.8. The Layout Iomode . . . . . . . . . . . . . . . . . . . 355
13.10. Data Server Component File Size . . . . . . . . . . . . 352 13.9. Metadata and Data Server State Coordination . . . . . . 356
13.11. Layout Revocation and Fencing . . . . . . . . . . . . . 352 13.10. Data Server Component File Size . . . . . . . . . . . . 359
13.12. Security Considerations for the File Layout Type . . . . 353 13.11. Layout Revocation and Fencing . . . . . . . . . . . . . 359
14. Internationalization . . . . . . . . . . . . . . . . . . . . 354 13.12. Security Considerations for the File Layout Type . . . . 360
14.1. Stringprep Profile for the utf8str_cs Type . . . . . . . 355 14. Internationalization . . . . . . . . . . . . . . . . . . . . 361
14.2. Stringprep Profile for the utf8str_cis Type . . . . . . 357 14.1. Stringprep Profile for the utf8str_cs Type . . . . . . . 362
14.3. Stringprep Profile for the utf8str_mixed Type . . . . . 358 14.2. Stringprep Profile for the utf8str_cis Type . . . . . . 364
14.4. UTF-8 Capabilities . . . . . . . . . . . . . . . . . . . 360 14.3. Stringprep Profile for the utf8str_mixed Type . . . . . 365
14.5. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 360 14.4. UTF-8 Capabilities . . . . . . . . . . . . . . . . . . . 367
15. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 361 14.5. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 367
15.1. Error Definitions . . . . . . . . . . . . . . . . . . . 361 15. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 368
15.2. Operations and Their Valid Errors . . . . . . . . . . . 382 15.1. Error Definitions . . . . . . . . . . . . . . . . . . . 368
15.3. Callback Operations and Their Valid Errors . . . . . . . 398 15.2. Operations and Their Valid Errors . . . . . . . . . . . 390
15.4. Errors and the Operations That Use Them . . . . . . . . 401 15.3. Callback Operations and Their Valid Errors . . . . . . . 406
16. NFSv4.1 Procedures . . . . . . . . . . . . . . . . . . . . . 415 15.4. Errors and the Operations That Use Them . . . . . . . . 409
16.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 415 16. NFSv4.1 Procedures . . . . . . . . . . . . . . . . . . . . . 423
16.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 416 16.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 423
17. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . 427 16.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 424
18. NFSv4.1 Operations . . . . . . . . . . . . . . . . . . . . . 430 17. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . 435
18.1. Operation 3: ACCESS - Check Access Rights . . . . . . . 430 18. NFSv4.1 Operations . . . . . . . . . . . . . . . . . . . . . 438
18.2. Operation 4: CLOSE - Close File . . . . . . . . . . . . 436 18.1. Operation 3: ACCESS - Check Access Rights . . . . . . . 438
18.3. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 437 18.2. Operation 4: CLOSE - Close File . . . . . . . . . . . . 444
18.4. Operation 6: CREATE - Create a Non-Regular File Object . 440 18.3. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 445
18.4. Operation 6: CREATE - Create a Non-Regular File Object . 448
18.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting 18.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting
Recovery . . . . . . . . . . . . . . . . . . . . . . . . 443 Recovery . . . . . . . . . . . . . . . . . . . . . . . . 451
18.6. Operation 8: DELEGRETURN - Return Delegation . . . . . . 444 18.6. Operation 8: DELEGRETURN - Return Delegation . . . . . . 452
18.7. Operation 9: GETATTR - Get Attributes . . . . . . . . . 444 18.7. Operation 9: GETATTR - Get Attributes . . . . . . . . . 452
18.8. Operation 10: GETFH - Get Current Filehandle . . . . . . 446 18.8. Operation 10: GETFH - Get Current Filehandle . . . . . . 454
18.9. Operation 11: LINK - Create Link to a File . . . . . . . 447 18.9. Operation 11: LINK - Create Link to a File . . . . . . . 455
18.10. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 450 18.10. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 458
18.11. Operation 13: LOCKT - Test for Lock . . . . . . . . . . 455 18.11. Operation 13: LOCKT - Test for Lock . . . . . . . . . . 463
18.12. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 456 18.12. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 464
18.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 458 18.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 466
18.14. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 460 18.14. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 468
18.15. Operation 17: NVERIFY - Verify Difference in Attributes 461 18.15. Operation 17: NVERIFY - Verify Difference in Attributes 469
18.16. Operation 18: OPEN - Open a Regular File . . . . . . . . 462 18.16. Operation 18: OPEN - Open a Regular File . . . . . . . . 470
18.17. Operation 19: OPENATTR - Open Named Attribute Directory 482 18.17. Operation 19: OPENATTR - Open Named Attribute Directory 490
18.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 484 18.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 492
18.19. Operation 22: PUTFH - Set Current Filehandle . . . . . . 485 18.19. Operation 22: PUTFH - Set Current Filehandle . . . . . . 493
18.20. Operation 23: PUTPUBFH - Set Public Filehandle . . . . 486 18.20. Operation 23: PUTPUBFH - Set Public Filehandle . . . . 494
18.21. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 488 18.21. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 496
18.22. Operation 25: READ - Read from File . . . . . . . . . . 489 18.22. Operation 25: READ - Read from File . . . . . . . . . . 497
18.23. Operation 26: READDIR - Read Directory . . . . . . . . . 491 18.23. Operation 26: READDIR - Read Directory . . . . . . . . . 499
18.24. Operation 27: READLINK - Read Symbolic Link . . . . . . 495 18.24. Operation 27: READLINK - Read Symbolic Link . . . . . . 503
18.25. Operation 28: REMOVE - Remove File System Object . . . . 496 18.25. Operation 28: REMOVE - Remove File System Object . . . . 504
18.26. Operation 29: RENAME - Rename Directory Entry . . . . . 499 18.26. Operation 29: RENAME - Rename Directory Entry . . . . . 507
18.27. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 502 18.27. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 510
18.28. Operation 32: SAVEFH - Save Current Filehandle . . . . . 503 18.28. Operation 32: SAVEFH - Save Current Filehandle . . . . . 511
18.29. Operation 33: SECINFO - Obtain Available Security . . . 504 18.29. Operation 33: SECINFO - Obtain Available Security . . . 512
18.30. Operation 34: SETATTR - Set Attributes . . . . . . . . . 508 18.30. Operation 34: SETATTR - Set Attributes . . . . . . . . . 516
18.31. Operation 37: VERIFY - Verify Same Attributes . . . . . 511 18.31. Operation 37: VERIFY - Verify Same Attributes . . . . . 519
18.32. Operation 38: WRITE - Write to File . . . . . . . . . . 512 18.32. Operation 38: WRITE - Write to File . . . . . . . . . . 520
18.33. Operation 40: BACKCHANNEL_CTL - Backchannel Control . . 517 18.33. Operation 40: BACKCHANNEL_CTL - Backchannel Control . . 525
18.34. Operation 41: BIND_CONN_TO_SESSION - Associate 18.34. Operation 41: BIND_CONN_TO_SESSION - Associate
Connection with Session . . . . . . . . . . . . . . . . 518 Connection with Session . . . . . . . . . . . . . . . . 526
18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 521 18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 529
18.36. Operation 43: CREATE_SESSION - Create New Session and 18.36. Operation 43: CREATE_SESSION - Create New Session and
Confirm Client ID . . . . . . . . . . . . . . . . . . . 539 Confirm Client ID . . . . . . . . . . . . . . . . . . . 548
18.37. Operation 44: DESTROY_SESSION - Destroy a Session . . . 550 18.37. Operation 44: DESTROY_SESSION - Destroy a Session . . . 558
18.38. Operation 45: FREE_STATEID - Free Stateid with No Locks 551 18.38. Operation 45: FREE_STATEID - Free Stateid with No Locks 560
18.39. Operation 46: GET_DIR_DELEGATION - Get a Directory 18.39. Operation 46: GET_DIR_DELEGATION - Get a Directory
Delegation . . . . . . . . . . . . . . . . . . . . . . . 552 Delegation . . . . . . . . . . . . . . . . . . . . . . . 561
18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 556 18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 565
18.41. Operation 48: GETDEVICELIST - Get All Device Mappings 18.41. Operation 48: GETDEVICELIST - Get All Device Mappings
for a File System . . . . . . . . . . . . . . . . . . . 559 for a File System . . . . . . . . . . . . . . . . . . . 568
18.42. Operation 49: LAYOUTCOMMIT - Commit Writes Made Using a 18.42. Operation 49: LAYOUTCOMMIT - Commit Writes Made Using a
Layout . . . . . . . . . . . . . . . . . . . . . . . . . 561 Layout . . . . . . . . . . . . . . . . . . . . . . . . . 569
18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 565 18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 573
18.44. Operation 51: LAYOUTRETURN - Release Layout Information 574 18.44. Operation 51: LAYOUTRETURN - Release Layout Information 583
18.45. Operation 52: SECINFO_NO_NAME - Get Security on Unnamed 18.45. Operation 52: SECINFO_NO_NAME - Get Security on Unnamed
Object . . . . . . . . . . . . . . . . . . . . . . . . . 579 Object . . . . . . . . . . . . . . . . . . . . . . . . . 588
18.46. Operation 53: SEQUENCE - Supply Per-Procedure Sequencing 18.46. Operation 53: SEQUENCE - Supply Per-Procedure Sequencing
and Control . . . . . . . . . . . . . . . . . . . . . . 580 and Control . . . . . . . . . . . . . . . . . . . . . . 589
18.47. Operation 54: SET_SSV - Update SSV for a Client ID . . . 586 18.47. Operation 54: SET_SSV - Update SSV for a Client ID . . . 595
18.48. Operation 55: TEST_STATEID - Test Stateids for Validity 588 18.48. Operation 55: TEST_STATEID - Test Stateids for Validity 597
18.49. Operation 56: WANT_DELEGATION - Request Delegation . . . 590 18.49. Operation 56: WANT_DELEGATION - Request Delegation . . . 599
18.50. Operation 57: DESTROY_CLIENTID - Destroy a Client ID . . 594 18.50. Operation 57: DESTROY_CLIENTID - Destroy a Client ID . . 603
18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims 18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims
Finished . . . . . . . . . . . . . . . . . . . . . . . . 595 Finished . . . . . . . . . . . . . . . . . . . . . . . . 604
18.52. Operation 10044: ILLEGAL - Illegal Operation . . . . . . 598 18.52. Operation 10044: ILLEGAL - Illegal Operation . . . . . . 607
19. NFSv4.1 Callback Procedures . . . . . . . . . . . . . . . . . 599 19. NFSv4.1 Callback Procedures . . . . . . . . . . . . . . . . . 608
19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 599 19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 608
19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 599 19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 608
20. NFSv4.1 Callback Operations . . . . . . . . . . . . . . . . . 604 20. NFSv4.1 Callback Operations . . . . . . . . . . . . . . . . . 613
20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 604 20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 613
20.2. Operation 4: CB_RECALL - Recall a Delegation . . . . . . 605 20.2. Operation 4: CB_RECALL - Recall a Delegation . . . . . . 614
20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from Client 606 20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from Client 615
20.4. Operation 6: CB_NOTIFY - Notify Client of Directory 20.4. Operation 6: CB_NOTIFY - Notify Client of Directory
Changes . . . . . . . . . . . . . . . . . . . . . . . . 609 Changes . . . . . . . . . . . . . . . . . . . . . . . . 618
20.5. Operation 7: CB_PUSH_DELEG - Offer Previously Requested 20.5. Operation 7: CB_PUSH_DELEG - Offer Previously Requested
Delegation to Client . . . . . . . . . . . . . . . . . . 613 Delegation to Client . . . . . . . . . . . . . . . . . . 622
20.6. Operation 8: CB_RECALL_ANY - Keep Any N Recallable 20.6. Operation 8: CB_RECALL_ANY - Keep Any N Recallable
Objects . . . . . . . . . . . . . . . . . . . . . . . . 614 Objects . . . . . . . . . . . . . . . . . . . . . . . . 623
20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal Resources 20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal Resources
for Recallable Objects . . . . . . . . . . . . . . . . . 617 for Recallable Objects . . . . . . . . . . . . . . . . . 626
20.8. Operation 10: CB_RECALL_SLOT - Change Flow Control 20.8. Operation 10: CB_RECALL_SLOT - Change Flow Control
Limits . . . . . . . . . . . . . . . . . . . . . . . . . 618 Limits . . . . . . . . . . . . . . . . . . . . . . . . . 627
20.9. Operation 11: CB_SEQUENCE - Supply Backchannel 20.9. Operation 11: CB_SEQUENCE - Supply Backchannel
Sequencing and Control . . . . . . . . . . . . . . . . . 619 Sequencing and Control . . . . . . . . . . . . . . . . . 628
20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending 20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending
Delegation Wants . . . . . . . . . . . . . . . . . . . . 622 Delegation Wants . . . . . . . . . . . . . . . . . . . . 631
20.11. Operation 13: CB_NOTIFY_LOCK - Notify Client of Possible 20.11. Operation 13: CB_NOTIFY_LOCK - Notify Client of Possible
Lock Availability . . . . . . . . . . . . . . . . . . . 623 Lock Availability . . . . . . . . . . . . . . . . . . . 632
20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify Client of 20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify Client of
Device ID Changes . . . . . . . . . . . . . . . . . . . 624 Device ID Changes . . . . . . . . . . . . . . . . . . . 633
20.13. Operation 10044: CB_ILLEGAL - Illegal Callback Operation 626 20.13. Operation 10044: CB_ILLEGAL - Illegal Callback Operation 635
21. Security Considerations . . . . . . . . . . . . . . . . . . . 627 21. Security Considerations . . . . . . . . . . . . . . . . . . . 636
22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 631 22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 640
22.1. IANA Actions Neeeded . . . . . . . . . . . . . . . . . . 631 22.1. IANA Actions Needed . . . . . . . . . . . . . . . . . . 641
22.2. Named Attribute Definitions . . . . . . . . . . . . . . 631 22.2. Named Attribute Definitions . . . . . . . . . . . . . . 641
22.3. Device ID Notifications . . . . . . . . . . . . . . . . 632 22.3. Device ID Notifications . . . . . . . . . . . . . . . . 642
22.4. Object Recall Types . . . . . . . . . . . . . . . . . . 634 22.4. Object Recall Types . . . . . . . . . . . . . . . . . . 644
22.5. Layout Types . . . . . . . . . . . . . . . . . . . . . . 636 22.5. Layout Types . . . . . . . . . . . . . . . . . . . . . . 645
22.6. Path Variable Definitions . . . . . . . . . . . . . . . 638 22.6. Path Variable Definitions . . . . . . . . . . . . . . . 648
23. References . . . . . . . . . . . . . . . . . . . . . . . . . 642 23. References . . . . . . . . . . . . . . . . . . . . . . . . . 652
23.1. Normative References . . . . . . . . . . . . . . . . . . 642 23.1. Normative References . . . . . . . . . . . . . . . . . . 652
23.2. Informative References . . . . . . . . . . . . . . . . . 645 23.2. Informative References . . . . . . . . . . . . . . . . . 655
Appendix A. Need for this Update . . . . . . . . . . . . . . . . 648 Appendix A. Need for this Update . . . . . . . . . . . . . . . . 659
Appendix B. Changes in this Update . . . . . . . . . . . . . . . 650 Appendix B. Changes in this Update . . . . . . . . . . . . . . . 661
B.1. Revisions Made to Section 11 of [RFC5661] . . . . . . . . 650 B.1. Revisions Made to Section 11 of RFC5661 . . . . . . . . . 661
B.2. Revisions Made to Operations in [RFC5661] . . . . . . . . 653 B.2. Revisions Made to Operations in RFC5661 . . . . . . . . . 664
B.3. Revisions Made to Error Definitions in [RFC5661] . . . . 656 B.3. Revisions Made to Error Definitions in RFC5661 . . . . . 666
B.4. Other Revisions Made to [RFC5661] . . . . . . . . . . . . 656 B.4. Other Revisions Made to RFC5661 . . . . . . . . . . . . . 667
Appendix C. Security Issues that Need to be Addressed . . . . . 657 Appendix C. Security Issues that Need to be Addressed . . . . . 668
Appendix D. Acknowledgments . . . . . . . . . . . . . . . . . . 659 Appendix D. Acknowledgments . . . . . . . . . . . . . . . . . . 670
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 662 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 673
1. Introduction 1. Introduction
1.1. Introduction to this Update 1.1. Introduction to this Update
Two important features previously defined in minor version 0 but
never fully addressed in minor version 1 are trunking, the
simultaneous use of multiple connections between a client and server,
potentially to different network addresses, and transparent state
migration, which allows a file system to be transferred between
servers in a way that provides to the client the ability to maintain
its existing locking state across the transfer.
The revised description of the NFS version 4 minor version 1 The revised description of the NFS version 4 minor version 1
(NFSv4.1) protocol presented in this update is necessary to enable (NFSv4.1) protocol presented in this update is necessary to enable
full use of trunking in connection with multi-server namespace full use of these features together with other multi-server namespace
features and to enable the use of transparent state migration in features. This document is in the form of an updated description of
connection with NFSv4.1. This document is in the form of an updated the NFSv4.1 protocol previously defined in RFC5661 [65]. RFC5661 is
description of the NFS 4.1 protocol previously defined in RFC5661 obsoleted by this document. However, the update has a limited scope
[62]. RFC5661 is obsoleted by this document. However, the update and is focused on enabling full use of trunking and transparent state
has a limited scope and is focused on enabling full use of trunking migration. The need for these changes is discussed in Appendix A.
and transparent state migration. The need for these changes is Appendix B describes the specific changes made to arrive at the
discussed in Appendix A. Appendix B describes the specific changes current text.
made to arrive at the current text.
This limited scope update is applied to the main NFSv4.1 RFC with the This limited-scope update replaces the current NFSv4.1 RFC with the
intention of providing an authoritative complete specification, the intention of providing an authoritative and complete specification,
motivation for which is discussed in [I.D-roach-bis-documents], the motivation for which is discussed in [35], addressing the issues
addressing the issues within the scope of the update. However, it within the scope of the update. However, it will not address issues
will not address issues that are known but outside of this limited that are known but outside of this limited scope as could expected by
scope as could expected by a full update of the protocol. Below are a full update of the protocol. Below are some areas which are known
some areas which are known to need addressing in a future update of to need addressing in a future update of the protocol.
the protocol.
o Work would have to be done with regard to RFC8178 [63] which o Work needs to be done with regard to RFC8178 [66] which
establishes NFSv4-wide versioning rules. As RFC5661 is curretly establishes NFSv4-wide versioning rules. As RFC5661 is currently
inconsistent with this document, changes are needed in order to inconsistent with that document, changes are needed in order to
arrive at a situation in which there would be no need for RFC8178 arrive at a situation in which there would be no need for RFC8178
to update the NFSv4.1 specfication. to update the NFSv4.1 specification.
o Work would have to be done with regard to RFC8434 [66], which o Work needs to be done with regard to RFC8434 [69], which
establishes the requirements for pNFS layout types, which are not establishes the requirements for pNFS layout types, which are not
clearly defined in RFC5661. When that work is done and the clearly defined in RFC5661. When that work is done and the
resulting documents approved, the new NFSv4.1 specfication resulting documents approved, the new NFSv4.1 specification
document will provide a clear set of requirements for layout types document will provide a clear set of requirements for layout types
and a description of the file layout type that conforms to those and a description of the file layout type that conforms to those
requirements. Other layout types will have their own specfication requirements. Other layout types will have their own
documents that conforms to those requirements as well. specification documents that conforms to those requirements as
well.
o Work would have to be done to address many erratas relevant to RFC o Work needs to be done to address many errata reports relevant to
5661, other than errata 2006 [60], which is addressed in this RFC 5661, other than errata report 2006 [63], which is addressed
document. That errata was not deferrable because of the in this document. Addressing that report was not deferrable
interaction of the changes suggested in that errata and handling because of the interaction of the changes suggested there and the
of state and session migration. The erratas that have been newly described handling of state and session migration.
deferred include changes originally suggested by a particular
errata, which change consensus decisions made in RFC 5661, which The errata reports that have been deferred and that will need to
need to be changed to ensure compatibility with existing be addressed in a later document include reports currently
implementations that do not follow the handling delineated in RFC assigned a range of statuses in the errata reporting system
5661. Note that it is expected that such erratas will remain including reports marked Accepted and those marked Hold For
Document Update because the change was too minor to address
immediately.
In addition, there is a set of other reports, including at least
one in state Rejected, which will need to be addressed in a later
document. This will involve making changes to consensus decisions
reflected in RFC 5661, in situation in which the working group has
decided that the treatment in RFC 5661 is incorrect, and needs to
be revised to reflect the working group's new consensus and ensure
compatibility with existing implementations that do not follow the
handling described in in RFC 5661.
Note that it is expected that all such errata reports will remain
relevant to implementers and the authors of an eventual relevant to implementers and the authors of an eventual
rfc5661bis, despite the fact that this document, when approved, rfc5661bis, despite the fact that this document, when approved,
will obsolete RFC 5661. will obsolete RFC 5661 [65].
o There is a need for a new approach to the description of o There is a need for a new approach to the description of
internationalization since the current internationalization internationalization since the current internationalization
section (Section 14) has never been implemented and does not meet section (Section 14) has never been implemented and does not meet
the needs of the NFSv4 protocol. Possible solutions are to create the needs of the NFSv4 protocol. Possible solutions are to create
a new internationalization section modeled on that in [64] or to a new internationalization section modeled on that in [67] or to
create a new document describing internationalization for all create a new document describing internationalization for all
NFSv4 minor versions and reference that document in the RFCs NFSv4 minor versions and reference that document in the RFCs
defining both NFSv4.0 and NFSv4.1. defining both NFSv4.0 and NFSv4.1.
o There is a need for a revised treatment of security in NFSv4.1. o There is a need for a revised treatment of security in NFSv4.1.
The issues with the existing treatment are discussed in The issues with the existing treatment are discussed in
Appendix C. Appendix C.
Until the above work is done, there will not be a consistent set of Until the above work is done, there will not be a consistent set of
documents providing a description of the NFSv4.1 protocol and any documents providing a description of the NFSv4.1 protocol and any
full description would involve documents updating other documents full description would involve documents updating other documents
within the specification, just as RFC 8434 [66] and RFC 8178 [63] do within the specification. The updates applied by RFC8434 [69] and
today. RFC8178 [66] to RFC5661 also apply to this specification, and will
apply to any subsequent v4.1 specification until that work is done.
1.2. The NFS Version 4 Minor Version 1 Protocol 1.2. The NFS Version 4 Minor Version 1 Protocol
The NFS version 4 minor version 1 (NFSv4.1) protocol is the second The NFS version 4 minor version 1 (NFSv4.1) protocol is the second
minor version of the NFS version 4 (NFSv4) protocol. The first minor minor version of the NFS version 4 (NFSv4) protocol. The first minor
version, NFSv4.0, is now described in RFC 7530 [64]. It generally version, NFSv4.0, is now described in RFC 7530 [67]. It generally
follows the guidelines for minor versioning that are listed in follows the guidelines for minor versioning that are listed in
Section 10 of RFC 3530. However, it diverges from guidelines 11 ("a Section 10 of RFC 3530. However, it diverges from guidelines 11 ("a
client and server that support minor version X must support minor client and server that support minor version X must support minor
versions 0 through X-1") and 12 ("no new features may be introduced versions 0 through X-1") and 12 ("no new features may be introduced
as mandatory in a minor version"). These divergences are due to the as mandatory in a minor version"). These divergences are due to the
introduction of the sessions model for managing non-idempotent introduction of the sessions model for managing non-idempotent
operations and the RECLAIM_COMPLETE operation. These two new operations and the RECLAIM_COMPLETE operation. These two new
features are infrastructural in nature and simplify implementation of features are infrastructural in nature and simplify implementation of
existing and other new features. Making them anything but REQUIRED existing and other new features. Making them anything but REQUIRED
would add undue complexity to protocol definition and implementation. would add undue complexity to protocol definition and implementation.
skipping to change at page 10, line 15 skipping to change at page 10, line 35
o describe the NFSv4.0 protocol, except where needed to contrast o describe the NFSv4.0 protocol, except where needed to contrast
with NFSv4.1. with NFSv4.1.
o modify the specification of the NFSv4.0 protocol. o modify the specification of the NFSv4.0 protocol.
o clarify the NFSv4.0 protocol. o clarify the NFSv4.0 protocol.
1.5. NFSv4 Goals 1.5. NFSv4 Goals
The NFSv4 protocol is a further revision of the NFS protocol defined The NFSv4 protocol is a further revision of the NFS protocol defined
already by NFSv3 [34]. It retains the essential characteristics of already by NFSv3 [37]. It retains the essential characteristics of
previous versions: easy recovery; independence of transport previous versions: easy recovery; independence of transport
protocols, operating systems, and file systems; simplicity; and good protocols, operating systems, and file systems; simplicity; and good
performance. NFSv4 has the following goals: performance. NFSv4 has the following goals:
o Improved access and good performance on the Internet o Improved access and good performance on the Internet
The protocol is designed to transit firewalls easily, perform well The protocol is designed to transit firewalls easily, perform well
where latency is high and bandwidth is low, and scale to very where latency is high and bandwidth is low, and scale to very
large numbers of clients per server. large numbers of clients per server.
skipping to change at page 15, line 43 skipping to change at page 16, line 21
filehandles. filehandles.
1.8.3.2. File Attributes 1.8.3.2. File Attributes
The NFSv4.1 protocol has a rich and extensible file object attribute The NFSv4.1 protocol has a rich and extensible file object attribute
structure, which is divided into REQUIRED, RECOMMENDED, and named structure, which is divided into REQUIRED, RECOMMENDED, and named
attributes (see Section 5). attributes (see Section 5).
Several (but not all) of the REQUIRED attributes are derived from the Several (but not all) of the REQUIRED attributes are derived from the
attributes of NFSv3 (see the definition of the fattr3 data type in attributes of NFSv3 (see the definition of the fattr3 data type in
[34]). An example of a REQUIRED attribute is the file object's type [37]). An example of a REQUIRED attribute is the file object's type
(Section 5.8.1.2) so that regular files can be distinguished from (Section 5.8.1.2) so that regular files can be distinguished from
directories (also known as folders in some operating environments) directories (also known as folders in some operating environments)
and other types of objects. REQUIRED attributes are discussed in and other types of objects. REQUIRED attributes are discussed in
Section 5.1. Section 5.1.
An example of three RECOMMENDED attributes are acl, sacl, and dacl. An example of three RECOMMENDED attributes are acl, sacl, and dacl.
These attributes define an Access Control List (ACL) on a file object These attributes define an Access Control List (ACL) on a file object
(Section 6). An ACL provides directory and file access control (Section 6). An ACL provides directory and file access control
beyond the model used in NFSv3. The ACL definition allows for beyond the model used in NFSv3. The ACL definition allows for
specification of specific sets of permissions for individual users specification of specific sets of permissions for individual users
skipping to change at page 18, line 50 skipping to change at page 19, line 29
o Data retention (Section 5.13). o Data retention (Section 5.13).
o Identification of the implementation of the NFS client and server o Identification of the implementation of the NFS client and server
(Section 18.35). (Section 18.35).
o Support for notification of the availability of byte-range locks o Support for notification of the availability of byte-range locks
(see the new OPEN4_RESULT_MAY_NOTIFY_LOCK reply flag in (see the new OPEN4_RESULT_MAY_NOTIFY_LOCK reply flag in
Section 18.16 and see Section 20.11). Section 18.16 and see Section 20.11).
o In NFSv4.1, LIPKEY and SPKM-3 are not required security mechanisms o In NFSv4.1, LIPKEY and SPKM-3 are not required security mechanisms
[35]. [38].
2. Core Infrastructure 2. Core Infrastructure
2.1. Introduction 2.1. Introduction
NFSv4.1 relies on core infrastructure common to nearly every NFSv4.1 relies on core infrastructure common to nearly every
operation. This core infrastructure is described in the remainder of operation. This core infrastructure is described in the remainder of
this section. this section.
2.2. RPC and XDR 2.2. RPC and XDR
skipping to change at page 21, line 30 skipping to change at page 21, line 52
------------------------------------------------------------------ ------------------------------------------------------------------
390003 krb5 1.2.840.113554.1.2.2 rpc_gss_svc_none yes yes 390003 krb5 1.2.840.113554.1.2.2 rpc_gss_svc_none yes yes
390004 krb5i 1.2.840.113554.1.2.2 rpc_gss_svc_integrity yes yes 390004 krb5i 1.2.840.113554.1.2.2 rpc_gss_svc_integrity yes yes
390005 krb5p 1.2.840.113554.1.2.2 rpc_gss_svc_privacy no yes 390005 krb5p 1.2.840.113554.1.2.2 rpc_gss_svc_privacy no yes
Note that the number and name of the pseudo flavor are presented here Note that the number and name of the pseudo flavor are presented here
as a mapping aid to the implementor. Because the NFSv4.1 protocol as a mapping aid to the implementor. Because the NFSv4.1 protocol
includes a method to negotiate security and it understands the GSS- includes a method to negotiate security and it understands the GSS-
API mechanism, the pseudo flavor is not needed. The pseudo flavor is API mechanism, the pseudo flavor is not needed. The pseudo flavor is
needed for the NFSv3 since the security negotiation is done via the needed for the NFSv3 since the security negotiation is done via the
MOUNT protocol as described in [36]. MOUNT protocol as described in [39].
At the time NFSv4.1 was specified, the Advanced Encryption Standard At the time NFSv4.1 was specified, the Advanced Encryption Standard
(AES) with HMAC-SHA1 was a REQUIRED algorithm set for Kerberos V5. (AES) with HMAC-SHA1 was a REQUIRED algorithm set for Kerberos V5.
In contrast, when NFSv4.0 was specified, weaker algorithm sets were In contrast, when NFSv4.0 was specified, weaker algorithm sets were
REQUIRED for Kerberos V5, and were REQUIRED in the NFSv4.0 REQUIRED for Kerberos V5, and were REQUIRED in the NFSv4.0
specification, because the Kerberos V5 specification at the time did specification, because the Kerberos V5 specification at the time did
not specify stronger algorithms. The NFSv4.1 specification does not not specify stronger algorithms. The NFSv4.1 specification does not
specify REQUIRED algorithms for Kerberos V5, and instead, the specify REQUIRED algorithms for Kerberos V5, and instead, the
implementor is expected to track the evolution of the Kerberos V5 implementor is expected to track the evolution of the Kerberos V5
standard if and when stronger algorithms are specified. standard if and when stronger algorithms are specified.
skipping to change at page 23, line 14 skipping to change at page 23, line 35
defined in an analogous fashion to that of COMPOUND with its own set defined in an analogous fashion to that of COMPOUND with its own set
of callback operations. of callback operations.
The addition of new server and callback operations within the The addition of new server and callback operations within the
COMPOUND and CB_COMPOUND request framework provides a means of COMPOUND and CB_COMPOUND request framework provides a means of
extending the protocol in subsequent minor versions. extending the protocol in subsequent minor versions.
Except for a small number of operations needed for session creation, Except for a small number of operations needed for session creation,
server requests and callback requests are performed within the server requests and callback requests are performed within the
context of a session. Sessions provide a client context for every context of a session. Sessions provide a client context for every
request and support robust reply protection for non-idempotent request and support robust replay protection for non-idempotent
requests. requests.
2.4. Client Identifiers and Client Owners 2.4. Client Identifiers and Client Owners
For each operation that obtains or depends on locking state, the For each operation that obtains or depends on locking state, the
specific client needs to be identifiable by the server. specific client needs to be identifiable by the server.
Each distinct client instance is represented by a client ID. A Each distinct client instance is represented by a client ID. A
client ID is a 64-bit identifier representing a specific client at a client ID is a 64-bit identifier representing a specific client at a
given time. The client ID is changed whenever the client re- given time. The client ID is changed whenever the client re-
skipping to change at page 24, line 17 skipping to change at page 24, line 37
it makes conflicting lock requests. it makes conflicting lock requests.
Client identification is encapsulated in the following client owner Client identification is encapsulated in the following client owner
data type: data type:
struct client_owner4 { struct client_owner4 {
verifier4 co_verifier; verifier4 co_verifier;
opaque co_ownerid<NFS4_OPAQUE_LIMIT>; opaque co_ownerid<NFS4_OPAQUE_LIMIT>;
}; };
The first field, co_verifier, is a client incarnation verifier. The The first field, co_verifier, is a client incarnation verifier,
server will start the process of canceling the client's leased state allowing the server to distinguish successive incarnations (e.g.
if co_verifier is different than what the server has previously reboots) of the same client. The server will start the process of
recorded for the identified client (as specified in the co_ownerid canceling the client's leased state if co_verifier is different than
field). what the server has previously recorded for the identified client (as
specified in the co_ownerid field).
The second field, co_ownerid, is a variable length string that The second field, co_ownerid, is a variable length string that
uniquely defines the client so that subsequent instances of the same uniquely defines the client so that subsequent instances of the same
client bear the same co_ownerid with a different verifier. client bear the same co_ownerid with a different verifier.
There are several considerations for how the client generates the There are several considerations for how the client generates the
co_ownerid string: co_ownerid string:
o The string should be unique so that multiple clients do not o The string should be unique so that multiple clients do not
present the same string. The consequences of two clients present the same string. The consequences of two clients
skipping to change at page 24, line 48 skipping to change at page 25, line 20
the same string. The implementor is cautioned from an approach the same string. The implementor is cautioned from an approach
that requires the string to be recorded in a local file because that requires the string to be recorded in a local file because
this precludes the use of the implementation in an environment this precludes the use of the implementation in an environment
where there is no local disk and all file access is from an where there is no local disk and all file access is from an
NFSv4.1 server. NFSv4.1 server.
o The string should be the same for each server network address that o The string should be the same for each server network address that
the client accesses. This way, if a server has multiple the client accesses. This way, if a server has multiple
interfaces, the client can trunk traffic over multiple network interfaces, the client can trunk traffic over multiple network
paths as described in Section 2.10.5. (Note: the precise opposite paths as described in Section 2.10.5. (Note: the precise opposite
was advised in the NFSv4.0 specification [33].) was advised in the NFSv4.0 specification [36].)
o The algorithm for generating the string should not assume that the o The algorithm for generating the string should not assume that the
client's network address will not change, unless the client client's network address will not change, unless the client
implementation knows it is using statically assigned network implementation knows it is using statically assigned network
addresses. This includes changes between client incarnations and addresses. This includes changes between client incarnations and
even changes while the client is still running in its current even changes while the client is still running in its current
incarnation. Thus, with dynamic address assignment, if the client incarnation. Thus, with dynamic address assignment, if the client
includes just the client's network address in the co_ownerid includes just the client's network address in the co_ownerid
string, there is a real risk that after the client gives up the string, there is a real risk that after the client gives up the
network address, another client, using a similar algorithm for network address, another client, using a similar algorithm for
skipping to change at page 27, line 5 skipping to change at page 27, line 26
To facilitate upgrade from NFSv4.0 to NFSv4.1, a server may compare a To facilitate upgrade from NFSv4.0 to NFSv4.1, a server may compare a
value of data type client_owner4 in an EXCHANGE_ID with a value of value of data type client_owner4 in an EXCHANGE_ID with a value of
data type nfs_client_id4 that was established using the SETCLIENTID data type nfs_client_id4 that was established using the SETCLIENTID
operation of NFSv4.0. A server that does so will allow an upgraded operation of NFSv4.0. A server that does so will allow an upgraded
client to avoid waiting until the lease (i.e., the lease established client to avoid waiting until the lease (i.e., the lease established
by the NFSv4.0 instance client) expires. This requires that the by the NFSv4.0 instance client) expires. This requires that the
value of data type client_owner4 be constructed the same way as the value of data type client_owner4 be constructed the same way as the
value of data type nfs_client_id4. If the latter's contents included value of data type nfs_client_id4. If the latter's contents included
the server's network address (per the recommendations of the NFSv4.0 the server's network address (per the recommendations of the NFSv4.0
specification [33]), and the NFSv4.1 client does not wish to use a specification [36]), and the NFSv4.1 client does not wish to use a
client ID that prevents trunking, it should send two EXCHANGE_ID client ID that prevents trunking, it should send two EXCHANGE_ID
operations. The first EXCHANGE_ID will have a client_owner4 equal to operations. The first EXCHANGE_ID will have a client_owner4 equal to
the nfs_client_id4. This will clear the state created by the NFSv4.0 the nfs_client_id4. This will clear the state created by the NFSv4.0
client. The second EXCHANGE_ID will not have the server's network client. The second EXCHANGE_ID will not have the server's network
address. The state created for the second EXCHANGE_ID will not have address. The state created for the second EXCHANGE_ID will not have
to wait for lease expiration, because there will be no state to to wait for lease expiration, because there will be no state to
expire. expire.
2.4.2. Server Release of Client ID 2.4.2. Server Release of Client ID
skipping to change at page 29, line 15 skipping to change at page 29, line 36
fields are the same in two EXCHANGE_ID results, the connections that fields are the same in two EXCHANGE_ID results, the connections that
each EXCHANGE_ID were sent over can be assumed to address the same each EXCHANGE_ID were sent over can be assumed to address the same
server (as defined in Section 1.7). If the so_minor_id fields are server (as defined in Section 1.7). If the so_minor_id fields are
also the same, then not only do both connections connect to the same also the same, then not only do both connections connect to the same
server, but the session can be shared across both connections. The server, but the session can be shared across both connections. The
reader is cautioned that multiple servers may deliberately or reader is cautioned that multiple servers may deliberately or
accidentally claim to have the same so_major_id or so_major_id/ accidentally claim to have the same so_major_id or so_major_id/
so_minor_id; the reader should examine Sections 2.10.5 and 18.35 in so_minor_id; the reader should examine Sections 2.10.5 and 18.35 in
order to avoid acting on falsely matching server owner values. order to avoid acting on falsely matching server owner values.
The considerations for generating a so_major_id are similar to that The considerations for generating an so_major_id are similar to that
for generating a co_ownerid string (see Section 2.4). The for generating a co_ownerid string (see Section 2.4). The
consequences of two servers generating conflicting so_major_id values consequences of two servers generating conflicting so_major_id values
are less dire than they are for co_ownerid conflicts because the are less dire than they are for co_ownerid conflicts because the
client can use RPCSEC_GSS to compare the authenticity of each server client can use RPCSEC_GSS to compare the authenticity of each server
(see Section 2.10.5). (see Section 2.10.5).
2.6. Security Service Negotiation 2.6. Security Service Negotiation
With the NFSv4.1 server potentially offering multiple security With the NFSv4.1 server potentially offering multiple security
mechanisms, the client needs a method to determine or negotiate which mechanisms, the client needs a method to determine or negotiate which
skipping to change at page 34, line 51 skipping to change at page 35, line 17
2.7. Minor Versioning 2.7. Minor Versioning
To address the requirement of an NFS protocol that can evolve as the To address the requirement of an NFS protocol that can evolve as the
need arises, the NFSv4.1 protocol contains the rules and framework to need arises, the NFSv4.1 protocol contains the rules and framework to
allow for future minor changes or versioning. allow for future minor changes or versioning.
The base assumption with respect to minor versioning is that any The base assumption with respect to minor versioning is that any
future accepted minor version will be documented in one or more future accepted minor version will be documented in one or more
Standards Track RFCs. Minor version 0 of the NFSv4 protocol is Standards Track RFCs. Minor version 0 of the NFSv4 protocol is
represented by [33], and minor version 1 is represented by this RFC. represented by [36], and minor version 1 is represented by this RFC.
The COMPOUND and CB_COMPOUND procedures support the encoding of the The COMPOUND and CB_COMPOUND procedures support the encoding of the
minor version being requested by the client. minor version being requested by the client.
The following items represent the basic rules for the development of The following items represent the basic rules for the development of
minor versions. Note that a future minor version may modify or add minor versions. Note that a future minor version may modify or add
to the following rules as part of the minor version definition. to the following rules as part of the minor version definition.
1. Procedures are not added or deleted. 1. Procedures are not added or deleted.
To maintain the general RPC model, NFSv4 minor versions will not To maintain the general RPC model, NFSv4 minor versions will not
skipping to change at page 38, line 14 skipping to change at page 38, line 24
2.9. Transport Layers 2.9. Transport Layers
2.9.1. REQUIRED and RECOMMENDED Properties of Transports 2.9.1. REQUIRED and RECOMMENDED Properties of Transports
NFSv4.1 works over Remote Direct Memory Access (RDMA) and non-RDMA- NFSv4.1 works over Remote Direct Memory Access (RDMA) and non-RDMA-
based transports with the following attributes: based transports with the following attributes:
o The transport supports reliable delivery of data, which NFSv4.1 o The transport supports reliable delivery of data, which NFSv4.1
requires but neither NFSv4.1 nor RPC has facilities for ensuring requires but neither NFSv4.1 nor RPC has facilities for ensuring
[37]. [40].
o The transport delivers data in the order it was sent. Ordered o The transport delivers data in the order it was sent. Ordered
delivery simplifies detection of transmit errors, and simplifies delivery simplifies detection of transmit errors, and simplifies
the sending of arbitrary sized requests and responses via the the sending of arbitrary sized requests and responses via the
record marking protocol [3]. record marking protocol [3].
Where an NFSv4.1 implementation supports operation over the IP Where an NFSv4.1 implementation supports operation over the IP
network protocol, any transport used between NFS and IP MUST be among network protocol, any transport used between NFS and IP MUST be among
the IETF-approved congestion control transport protocols. At the the IETF-approved congestion control transport protocols. At the
time this document was written, the only two transports that had the time this document was written, the only two transports that had the
skipping to change at page 40, line 20 skipping to change at page 40, line 31
contents must not be blindly used when replies are sent from it, contents must not be blindly used when replies are sent from it,
and credit information appropriate to the channel must be and credit information appropriate to the channel must be
refreshed by the RPC layer. refreshed by the RPC layer.
In addition, as described in Section 2.10.6.2, while a session is In addition, as described in Section 2.10.6.2, while a session is
active, the NFSv4.1 requester MUST NOT stop waiting for a reply. active, the NFSv4.1 requester MUST NOT stop waiting for a reply.
2.9.3. Ports 2.9.3. Ports
Historically, NFSv3 servers have listened over TCP port 2049. The Historically, NFSv3 servers have listened over TCP port 2049. The
registered port 2049 [38] for the NFS protocol should be the default registered port 2049 [41] for the NFS protocol should be the default
configuration. NFSv4.1 clients SHOULD NOT use the RPC binding configuration. NFSv4.1 clients SHOULD NOT use the RPC binding
protocols as described in [39]. protocols as described in [42].
2.10. Session 2.10. Session
NFSv4.1 clients and servers MUST support and MUST use the session NFSv4.1 clients and servers MUST support and MUST use the session
feature as described in this section. feature as described in this section.
2.10.1. Motivation and Overview 2.10.1. Motivation and Overview
Previous versions and minor versions of NFS have suffered from the Previous versions and minor versions of NFS have suffered from the
following: following:
skipping to change at page 41, line 19 skipping to change at page 41, line 30
sending callback requests, thus solving the firewall issue sending callback requests, thus solving the firewall issue
(Section 18.34). Races between responses from client requests and (Section 18.34). Races between responses from client requests and
callbacks caused by the requests are detected via the session's callbacks caused by the requests are detected via the session's
sequencing properties that are a consequence of EOS sequencing properties that are a consequence of EOS
(Section 2.10.6.3). (Section 2.10.6.3).
o The NFSv4.1 client can associate an arbitrary number of o The NFSv4.1 client can associate an arbitrary number of
connections with the session, and thus provide trunking connections with the session, and thus provide trunking
(Section 2.10.5). (Section 2.10.5).
o The NFSv4.1 client and server produces a session key independent o The NFSv4.1 client and server produce a session key independent of
of client and server machine credentials which can be used to client and server machine credentials which can be used to compute
compute a digest for protecting critical session management a digest for protecting critical session management operations
operations (Section 2.10.8.3). (Section 2.10.8.3).
o The NFSv4.1 client can also create secure RPCSEC_GSS contexts for o The NFSv4.1 client can also create secure RPCSEC_GSS contexts for
use by the session's backchannel that do not require the server to use by the session's backchannel that do not require the server to
authenticate to a client machine principal (Section 2.10.8.2). authenticate to a client machine principal (Section 2.10.8.2).
A session is a dynamically created, long-lived server object created A session is a dynamically created, long-lived server object created
by a client and used over time from one or more transport by a client and used over time from one or more transport
connections. Its function is to maintain the server's state relative connections. Its function is to maintain the server's state relative
to the connection(s) belonging to a client instance. This state is to the connection(s) belonging to a client instance. This state is
entirely independent of the connection itself, and indeed the state entirely independent of the connection itself, and indeed the state
skipping to change at page 43, line 42 skipping to change at page 43, line 50
backchannel. Because there are at most two channels per session, and backchannel. Because there are at most two channels per session, and
because each channel has a distinct purpose, channels are not because each channel has a distinct purpose, channels are not
assigned identifiers. assigned identifiers.
The fore channel is used for ordinary requests from the client to the The fore channel is used for ordinary requests from the client to the
server, and carries COMPOUND requests and responses. A session server, and carries COMPOUND requests and responses. A session
always has a fore channel. always has a fore channel.
The backchannel is used for callback requests from server to client, The backchannel is used for callback requests from server to client,
and carries CB_COMPOUND requests and responses. Whether or not there and carries CB_COMPOUND requests and responses. Whether or not there
is a backchannel is a decision made by the client; however, many is a backchannel is decided by the client; however, many features of
features of NFSv4.1 require a backchannel. NFSv4.1 servers MUST NFSv4.1 require a backchannel. NFSv4.1 servers MUST support
support backchannels. backchannels.
Each session has resources for each channel, including separate reply Each session has resources for each channel, including separate reply
caches (see Section 2.10.6.1). Note that even the backchannel caches (see Section 2.10.6.1). Note that even the backchannel
requires a reply cache (or, at least, a slot table in order to detect requires a reply cache (or, at least, a slot table in order to detect
retries) because some callback operations are nonidempotent. retries) because some callback operations are nonidempotent.
2.10.3.1. Association of Connections, Channels, and Sessions 2.10.3.1. Association of Connections, Channels, and Sessions
Each channel is associated with zero or more transport connections Each channel is associated with zero or more transport connections
(whether of the same transport protocol or different transport (whether of the same transport protocol or different transport
skipping to change at page 44, line 48 skipping to change at page 45, line 4
It is permissible for a connection of one type of transport to be It is permissible for a connection of one type of transport to be
associated with the fore channel, and a connection of a different associated with the fore channel, and a connection of a different
type to be associated with the backchannel. type to be associated with the backchannel.
2.10.4. Server Scope 2.10.4. Server Scope
Servers each specify a server scope value in the form of an opaque Servers each specify a server scope value in the form of an opaque
string eir_server_scope returned as part of the results of an string eir_server_scope returned as part of the results of an
EXCHANGE_ID operation. The purpose of the server scope is to allow a EXCHANGE_ID operation. The purpose of the server scope is to allow a
group of servers to indicate to clients that a set of servers sharing group of servers to indicate to clients that a set of servers sharing
the same server scope value has arranged to use compatible values of the same server scope value has arranged to use distinct values of
otherwise opaque identifiers. Thus, the identifiers generated by two opaque identifiers so that the two servers never assign the same
servers within that set can be assumed compatible so that, in some value to two distinct objects. Thus, the identifiers generated by
cases, identifiers generated by one server in that set may be two servers within that set can be assumed compatible so that, in
presented to another server of the same scope. certain important cases, identifiers generated by one server in that
set may be presented to another server of the same scope.
The use of such compatible values does not imply that a value The use of such compatible values does not imply that a value
generated by one server will always be accepted by another. In most generated by one server will always be accepted by another. In most
cases, it will not. However, a server will not accept a value cases, it will not. However, a server will not inadvertently accept
generated by another inadvertently. When it does accept it, it will a value generated by another server. When it does accept it, it will
be because it is recognized as valid and carrying the same meaning as be because it is recognized as valid and carrying the same meaning as
on another server of the same scope. on another server of the same scope.
When servers are of the same server scope, this compatibility of When servers are of the same server scope, this compatibility of
values applies to the following identifiers: values applies to the following identifiers:
o Filehandle values. A filehandle value accepted by two servers of o Filehandle values. A filehandle value accepted by two servers of
the same server scope denotes the same object. A WRITE operation the same server scope denotes the same object. A WRITE operation
sent to one server is reflected immediately in a READ sent to the sent to one server is reflected immediately in a READ sent to the
other. other.
skipping to change at page 48, line 38 skipping to change at page 48, line 46
may be different on subsequent EXCHANGE_ID requests made to the same may be different on subsequent EXCHANGE_ID requests made to the same
network address. network address.
In most cases such reconfiguration events will be disruptive and In most cases such reconfiguration events will be disruptive and
indicate that an IP address formerly connected to one server is now indicate that an IP address formerly connected to one server is now
connected to an entirely different one. connected to an entirely different one.
Some guidelines on client handling of such situations follow: Some guidelines on client handling of such situations follow:
o When eir_server_scope changes, the client has no assurance that o When eir_server_scope changes, the client has no assurance that
any id's it obtained previously (e.g. file handles, state ids, any id's it obtained previously (e.g. file handles) can be validly
client ids) can be validly used on the new server, and, even if used on the new server, and, even if the new server accepts them,
the new server accepts them, there is no assurance that this is there is no assurance that this is not due to accident. Thus, it
not due to accident. Thus, it is best to treat all such state as is best to treat all such state as lost/stale although a client
lost/stale although a client may assume that the probability of may assume that the probability of inadvertent acceptance is low
inadvertent acceptance is low and treat this situation as within and treat this situation as within the next case.
the next case.
o When eir_server_scope remains the same and o When eir_server_scope remains the same and
eir_server_owner.so_major_id changes, the client can use the eir_server_owner.so_major_id changes, the client can use the
filehandles it has, consider its locking state lost, and attempt filehandles it has, consider its locking state lost, and attempt
to reclaim or otherwise re-obtain its locks. It may find that its to reclaim or otherwise re-obtain its locks. It might find that
file handle IS now stale but if NFS4ERR_STALE is not received, it its file handle is now stale. However, if NFS4ERR_STALE is not
can proceed to reclaim or otherwise re-obtain its open locking returned, it can proceed to reclaim or otherwise re-obtain its
state. open locking state.
o When eir_server_scope and eir_server_owner.so_major_id remain the o When eir_server_scope and eir_server_owner.so_major_id remain the
same, the client has to use the now-current values of same, the client has to use the now-current values of
eir_server_owner.so_minor_id in deciding on appropriate forms of eir_server_owner.so_minor_id in deciding on appropriate forms of
trunking. This may result in connections being dropped or new trunking. This may result in connections being dropped or new
sessions being created. sessions being created.
2.10.5.1. Verifying Claims of Matching Server Identity 2.10.5.1. Verifying Claims of Matching Server Identity
When the server responds using two different connections claim When the server responds using two different connections claiming
matching or partially matching eir_server_owner, eir_server_scope, matching or partially matching eir_server_owner, eir_server_scope,
and eir_clientid values, the client does not have to trust the and eir_clientid values, the client does not have to trust the
servers' claims. The client may verify these claims before trunking servers' claims. The client may verify these claims before trunking
traffic in the following ways: traffic in the following ways:
o For session trunking, clients SHOULD reliably verify if o For session trunking, clients SHOULD reliably verify if
connections between different network paths are in fact associated connections between different network paths are in fact associated
with the same NFSv4.1 server and usable on the same session, and with the same NFSv4.1 server and usable on the same session, and
servers MUST allow clients to perform reliable verification. When servers MUST allow clients to perform reliable verification. When
a client ID is created, the client SHOULD specify that a client ID is created, the client SHOULD specify that
skipping to change at page 54, line 35 skipping to change at page 54, line 43
o If an operation is being used that does not start with SEQUENCE or o If an operation is being used that does not start with SEQUENCE or
CB_SEQUENCE (e.g., BIND_CONN_TO_SESSION), then the RPC XID is CB_SEQUENCE (e.g., BIND_CONN_TO_SESSION), then the RPC XID is
needed for correct operation to match the reply to the request. needed for correct operation to match the reply to the request.
o The SEQUENCE or CB_SEQUENCE operation may generate an error. If o The SEQUENCE or CB_SEQUENCE operation may generate an error. If
so, the embedded slot ID, sequence ID, and session ID (if present) so, the embedded slot ID, sequence ID, and session ID (if present)
in the request will not be in the reply, and the requester has in the request will not be in the reply, and the requester has
only the XID to match the reply to the request. only the XID to match the reply to the request.
Given that well-formulated XIDs continue to be required, this begs Given that well-formulated XIDs continue to be required, this raises
the question: why do SEQUENCE and CB_SEQUENCE replies have a session the question: why do SEQUENCE and CB_SEQUENCE replies have a session
ID, slot ID, and sequence ID? Having the session ID in the reply ID, slot ID, and sequence ID? Having the session ID in the reply
means that the requester does not have to use the XID to look up the means that the requester does not have to use the XID to look up the
session ID, which would be necessary if the connection were session ID, which would be necessary if the connection were
associated with multiple sessions. Having the slot ID and sequence associated with multiple sessions. Having the slot ID and sequence
ID in the reply means that the requester does not have to use the XID ID in the reply means that the requester does not have to use the XID
to look up the slot ID and sequence ID. Furthermore, since the XID to look up the slot ID and sequence ID. Furthermore, since the XID
is only 32 bits, it is too small to guarantee the re-association of a is only 32 bits, it is too small to guarantee the re-association of a
reply with its request [40]; having session ID, slot ID, and sequence reply with its request [43]; having session ID, slot ID, and sequence
ID in the reply allows the client to validate that the reply in fact ID in the reply allows the client to validate that the reply in fact
belongs to the matched request. belongs to the matched request.
The SEQUENCE (and CB_SEQUENCE) operation also carries a The SEQUENCE (and CB_SEQUENCE) operation also carries a
"highest_slotid" value, which carries additional requester slot usage "highest_slotid" value, which carries additional requester slot usage
information. The requester MUST always indicate the slot ID information. The requester MUST always indicate the slot ID
representing the outstanding request with the highest-numbered slot representing the outstanding request with the highest-numbered slot
value. The requester should in all cases provide the most value. The requester should in all cases provide the most
conservative value possible, although it can be increased somewhat conservative value possible, although it can be increased somewhat
above the actual instantaneous usage to maintain some minimum or above the actual instantaneous usage to maintain some minimum or
skipping to change at page 57, line 34 skipping to change at page 57, line 41
cache entry for the slot whenever an error is returned from SEQUENCE cache entry for the slot whenever an error is returned from SEQUENCE
or CB_SEQUENCE. or CB_SEQUENCE.
2.10.6.1.3. Optional Reply Caching 2.10.6.1.3. Optional Reply Caching
On a per-request basis, the requester can choose to direct the On a per-request basis, the requester can choose to direct the
replier to cache the reply to all operations after the first replier to cache the reply to all operations after the first
operation (SEQUENCE or CB_SEQUENCE) via the sa_cachethis or operation (SEQUENCE or CB_SEQUENCE) via the sa_cachethis or
csa_cachethis fields of the arguments to SEQUENCE or CB_SEQUENCE. csa_cachethis fields of the arguments to SEQUENCE or CB_SEQUENCE.
The reason it would not direct the replier to cache the entire reply The reason it would not direct the replier to cache the entire reply
is that the request is composed of all idempotent operations [37]. is that the request is composed of all idempotent operations [40].
Caching the reply may offer little benefit. If the reply is too Caching the reply may offer little benefit. If the reply is too
large (see Section 2.10.6.4), it may not be cacheable anyway. Even large (see Section 2.10.6.4), it may not be cacheable anyway. Even
if the reply to idempotent request is small enough to cache, if the reply to idempotent request is small enough to cache,
unnecessarily caching the reply slows down the server and increases unnecessarily caching the reply slows down the server and increases
RPC latency. RPC latency.
Whether or not the requester requests the reply to be cached has no Whether or not the requester requests the reply to be cached has no
effect on the slot processing. If the results of SEQUENCE or effect on the slot processing. If the result of SEQUENCE or
CB_SEQUENCE are NFS4_OK, then the slot's sequence ID MUST be CB_SEQUENCE is NFS4_OK, then the slot's sequence ID MUST be
incremented by one. If a requester does not direct the replier to incremented by one. If a requester does not direct the replier to
cache the reply, the replier MUST do one of following: cache the reply, the replier MUST do one of following:
o The replier can cache the entire original reply. Even though o The replier can cache the entire original reply. Even though
sa_cachethis or csa_cachethis is FALSE, the replier is always free sa_cachethis or csa_cachethis is FALSE, the replier is always free
to cache. It may choose this approach in order to simplify to cache. It may choose this approach in order to simplify
implementation. implementation.
o The replier enters into its reply cache a reply consisting of the o The replier enters into its reply cache a reply consisting of the
original results to the SEQUENCE or CB_SEQUENCE operation, and original results to the SEQUENCE or CB_SEQUENCE operation, and
skipping to change at page 60, line 38 skipping to change at page 60, line 45
requester does not know what sequence ID to use for the slot on its requester does not know what sequence ID to use for the slot on its
next request. For example, suppose a requester sends a request with next request. For example, suppose a requester sends a request with
sequence ID 1, and does not wait for the response. The next time it sequence ID 1, and does not wait for the response. The next time it
uses the slot, it sends the new request with sequence ID 2. If the uses the slot, it sends the new request with sequence ID 2. If the
replier has not seen the request with sequence ID 1, then the replier replier has not seen the request with sequence ID 1, then the replier
is not expecting sequence ID 2, and rejects the requester's new is not expecting sequence ID 2, and rejects the requester's new
request with NFS4ERR_SEQ_MISORDERED (as the result from SEQUENCE or request with NFS4ERR_SEQ_MISORDERED (as the result from SEQUENCE or
CB_SEQUENCE). CB_SEQUENCE).
RDMA fabrics do not guarantee that the memory handles (Steering Tags) RDMA fabrics do not guarantee that the memory handles (Steering Tags)
within each RPC/RDMA "chunk" [31] are valid on a scope outside that within each RPC/RDMA "chunk" [32] are valid on a scope outside that
of a single connection. Therefore, handles used by the direct of a single connection. Therefore, handles used by the direct
operations become invalid after connection loss. The server must operations become invalid after connection loss. The server must
ensure that any RDMA operations that must be replayed from the reply ensure that any RDMA operations that must be replayed from the reply
cache use the newly provided handle(s) from the most recent request. cache use the newly provided handle(s) from the most recent request.
A retry might be sent while the original request is still in progress A retry might be sent while the original request is still in progress
on the replier. The replier SHOULD deal with the issue by returning on the replier. The replier SHOULD deal with the issue by returning
NFS4ERR_DELAY as the reply to SEQUENCE or CB_SEQUENCE operation, but NFS4ERR_DELAY as the reply to SEQUENCE or CB_SEQUENCE operation, but
implementations MAY return NFS4ERR_MISORDERED. Since errors from implementations MAY return NFS4ERR_MISORDERED. Since errors from
SEQUENCE and CB_SEQUENCE are never recorded in the reply cache, this SEQUENCE and CB_SEQUENCE are never recorded in the reply cache, this
skipping to change at page 63, line 13 skipping to change at page 63, line 22
RENAME, and the tenth operation is a READ for one million bytes, the RENAME, and the tenth operation is a READ for one million bytes, the
server may return NFS4ERR_REP_TOO_BIG_TO_CACHE on the tenth server may return NFS4ERR_REP_TOO_BIG_TO_CACHE on the tenth
operation. Since the server executed several operations, especially operation. Since the server executed several operations, especially
the non-idempotent RENAME, the client's request to cache the reply the non-idempotent RENAME, the client's request to cache the reply
needs to be honored in order for the correct operation of exactly needs to be honored in order for the correct operation of exactly
once semantics. If the client retries the request, the server will once semantics. If the client retries the request, the server will
have cached a reply that contains results for ten of the eleven have cached a reply that contains results for ten of the eleven
requested operations, with the tenth operation having a status of requested operations, with the tenth operation having a status of
NFS4ERR_REP_TOO_BIG_TO_CACHE. NFS4ERR_REP_TOO_BIG_TO_CACHE.
A client needs to take care that when sending operations that change A client needs to take care that, when sending operations that change
the current filehandle (except for PUTFH, PUTPUBFH, PUTROOTFH, and the current filehandle (except for PUTFH, PUTPUBFH, PUTROOTFH, and
RESTOREFH), it not exceed the maximum reply buffer before the GETFH RESTOREFH), it does not exceed the maximum reply buffer before the
operation. Otherwise, the client will have to retry the operation GETFH operation. Otherwise, the client will have to retry the
that changed the current filehandle, in order to obtain the desired operation that changed the current filehandle, in order to obtain the
filehandle. For the OPEN operation (see Section 18.16), retry is not desired filehandle. For the OPEN operation (see Section 18.16),
always available as an option. The following guidelines for the retry is not always available as an option. The following guidelines
handling of filehandle-changing operations are advised: for the handling of filehandle-changing operations are advised:
o Within the same COMPOUND procedure, a client SHOULD send GETFH o Within the same COMPOUND procedure, a client SHOULD send GETFH
immediately after a current filehandle-changing operation. A immediately after a current filehandle-changing operation. A
client MUST send GETFH after a current filehandle-changing client MUST send GETFH after a current filehandle-changing
operation that is also non-idempotent (e.g., the OPEN operation), operation that is also non-idempotent (e.g., the OPEN operation),
unless the operation is RESTOREFH. RESTOREFH is an exception, unless the operation is RESTOREFH. RESTOREFH is an exception,
because even though it is non-idempotent, the filehandle RESTOREFH because even though it is non-idempotent, the filehandle RESTOREFH
produced originated from an operation that is either idempotent produced originated from an operation that is either idempotent
(e.g., PUTFH, LOOKUP), or non-idempotent (e.g., OPEN, CREATE). If (e.g., PUTFH, LOOKUP), or non-idempotent (e.g., OPEN, CREATE). If
the origin is non-idempotent, then because the client MUST send the origin is non-idempotent, then because the client MUST send
skipping to change at page 65, line 29 skipping to change at page 65, line 39
view the problem is as a single transaction consisting of each view the problem is as a single transaction consisting of each
operation in the COMPOUND followed by storing the result in operation in the COMPOUND followed by storing the result in
persistent storage, then finally a transaction commit. If there is a persistent storage, then finally a transaction commit. If there is a
failure before the transaction is committed, then the server rolls failure before the transaction is committed, then the server rolls
back the transaction. If the server itself fails, then when it back the transaction. If the server itself fails, then when it
restarts, its recovery logic could roll back the transaction before restarts, its recovery logic could roll back the transaction before
starting the NFSv4.1 server. starting the NFSv4.1 server.
While the description of the implementation for atomic execution of While the description of the implementation for atomic execution of
the request and caching of the reply is beyond the scope of this the request and caching of the reply is beyond the scope of this
document, an example implementation for NFSv2 [41] is described in document, an example implementation for NFSv2 [44] is described in
[42]. [45].
2.10.7. RDMA Considerations 2.10.7. RDMA Considerations
A complete discussion of the operation of RPC-based protocols over A complete discussion of the operation of RPC-based protocols over
RDMA transports is in [31]. A discussion of the operation of NFSv4, RDMA transports is in [32]. A discussion of the operation of NFSv4,
including NFSv4.1, over RDMA is in [32]. Where RDMA is considered, including NFSv4.1, over RDMA is in [33]. Where RDMA is considered,
this specification assumes the use of such a layering; it addresses this specification assumes the use of such a layering; it addresses
only the upper-layer issues relevant to making best use of RPC/RDMA. only the upper-layer issues relevant to making best use of RPC/RDMA.
2.10.7.1. RDMA Connection Resources 2.10.7.1. RDMA Connection Resources
RDMA requires its consumers to register memory and post buffers of a RDMA requires its consumers to register memory and post buffers of a
specific size and number for receive operations. specific size and number for receive operations.
Registration of memory can be a relatively high-overhead operation, Registration of memory can be a relatively high-overhead operation,
since it requires pinning of buffers, assignment of attributes (e.g., since it requires pinning of buffers, assignment of attributes (e.g.,
skipping to change at page 66, line 32 skipping to change at page 66, line 45
Previous versions of NFS do not provide flow control; instead, they Previous versions of NFS do not provide flow control; instead, they
rely on the windowing provided by transports like TCP to throttle rely on the windowing provided by transports like TCP to throttle
requests. This does not work with RDMA, which provides no operation requests. This does not work with RDMA, which provides no operation
flow control and will terminate a connection in error when limits are flow control and will terminate a connection in error when limits are
exceeded. Limits such as maximum number of requests outstanding are exceeded. Limits such as maximum number of requests outstanding are
therefore negotiated when a session is created (see the therefore negotiated when a session is created (see the
ca_maxrequests field in Section 18.36). These limits then provide ca_maxrequests field in Section 18.36). These limits then provide
the maxima within which each connection associated with the session's the maxima within which each connection associated with the session's
channel(s) must remain. RDMA connections are managed within these channel(s) must remain. RDMA connections are managed within these
limits as described in Section 3.3 of [31]; if there are multiple limits as described in Section 3.3 of [32]; if there are multiple
RDMA connections, then the maximum number of requests for a channel RDMA connections, then the maximum number of requests for a channel
will be divided among the RDMA connections. Put a different way, the will be divided among the RDMA connections. Put a different way, the
onus is on the replier to ensure that the total number of RDMA onus is on the replier to ensure that the total number of RDMA
credits across all connections associated with the replier's channel credits across all connections associated with the replier's channel
does exceed the channel's maximum number of outstanding requests. does exceed the channel's maximum number of outstanding requests.
The limits may also be modified dynamically at the replier's choosing The limits may also be modified dynamically at the replier's choosing
by manipulating certain parameters present in each NFSv4.1 reply. In by manipulating certain parameters present in each NFSv4.1 reply. In
addition, the CB_RECALL_SLOT callback operation (see Section 20.8) addition, the CB_RECALL_SLOT callback operation (see Section 20.8)
can be sent by a server to a client to return RDMA credits to the can be sent by a server to a client to return RDMA credits to the
server, thereby lowering the maximum number of requests a client can server, thereby lowering the maximum number of requests a client can
have outstanding to the server. have outstanding to the server.
2.10.7.3. Padding 2.10.7.3. Padding
Header padding is requested by each peer at session initiation (see Header padding is requested by each peer at session initiation (see
the ca_headerpadsize argument to CREATE_SESSION in Section 18.36), the ca_headerpadsize argument to CREATE_SESSION in Section 18.36),
and subsequently used by the RPC RDMA layer, as described in [31]. and subsequently used by the RPC RDMA layer, as described in [32].
Zero padding is permitted. Zero padding is permitted.
Padding leverages the useful property that RDMA preserve alignment of Padding leverages the useful property that RDMA preserve alignment of
data, even when they are placed into anonymous (untagged) buffers. data, even when they are placed into anonymous (untagged) buffers.
If requested, client inline writes will insert appropriate pad bytes If requested, client inline writes will insert appropriate pad bytes
within the request header to align the data payload on the specified within the request header to align the data payload on the specified
boundary. The client is encouraged to add sufficient padding (up to boundary. The client is encouraged to add sufficient padding (up to
the negotiated size) so that the "data" field of the WRITE operation the negotiated size) so that the "data" field of the WRITE operation
is aligned. Most servers can make good use of such padding, which is aligned. Most servers can make good use of such padding, which
allows them to chain receive buffers in such a way that any data allows them to chain receive buffers in such a way that any data
skipping to change at page 68, line 39 skipping to change at page 69, line 7
2.10.8.2. Backchannel RPC Security 2.10.8.2. Backchannel RPC Security
When the NFSv4.1 client establishes the backchannel, it informs the When the NFSv4.1 client establishes the backchannel, it informs the
server of the security flavors and principals to use when sending server of the security flavors and principals to use when sending
requests. If the security flavor is RPCSEC_GSS, the client expresses requests. If the security flavor is RPCSEC_GSS, the client expresses
the principal in the form of an established RPCSEC_GSS context. The the principal in the form of an established RPCSEC_GSS context. The
server is free to use any of the flavor/principal combinations the server is free to use any of the flavor/principal combinations the
client offers, but it MUST NOT use unoffered combinations. This way, client offers, but it MUST NOT use unoffered combinations. This way,
the client need not provide a target GSS principal for the the client need not provide a target GSS principal for the
backchannel as it did with NFSv4.0, nor does the server have to backchannel as it did with NFSv4.0, nor does the server have to
implement an RPCSEC_GSS initiator as it did with NFSv4.0 [33]. implement an RPCSEC_GSS initiator as it did with NFSv4.0 [36].
The CREATE_SESSION (Section 18.36) and BACKCHANNEL_CTL The CREATE_SESSION (Section 18.36) and BACKCHANNEL_CTL
(Section 18.33) operations allow the client to specify flavor/ (Section 18.33) operations allow the client to specify flavor/
principal combinations. principal combinations.
Also note that the SP4_SSV state protection mode (see Sections 18.35 Also note that the SP4_SSV state protection mode (see Sections 18.35
and 2.10.8.3) has the side benefit of providing SSV-derived and 2.10.8.3) has the side benefit of providing SSV-derived
RPCSEC_GSS contexts (Section 2.10.9). RPCSEC_GSS contexts (Section 2.10.9).
2.10.8.3. Protection from Unauthorized State Changes 2.10.8.3. Protection from Unauthorized State Changes
skipping to change at page 70, line 14 skipping to change at page 70, line 29
(perhaps because the user has logged off). When the client (perhaps because the user has logged off). When the client
establishes SP4_MACH_CRED or SP4_SSV protection, it can specify a establishes SP4_MACH_CRED or SP4_SSV protection, it can specify a
list of operations that the server MUST allow using the machine list of operations that the server MUST allow using the machine
credential (if SP4_MACH_CRED is used) or the SSV credential (if credential (if SP4_MACH_CRED is used) or the SSV credential (if
SP4_SSV is used). SP4_SSV is used).
The SP4_MACH_CRED state protection option uses a machine credential The SP4_MACH_CRED state protection option uses a machine credential
where the principal that creates the client ID MUST also be the where the principal that creates the client ID MUST also be the
principal that performs client ID and session maintenance operations. principal that performs client ID and session maintenance operations.
The security of the machine credential state protection approach The security of the machine credential state protection approach
depends entirely on safe guarding the per-machine credential. depends entirely on safeguarding the per-machine credential.
Assuming a proper safeguard using the per-machine credential for Assuming a proper safeguard using the per-machine credential for
operations like CREATE_SESSION, BIND_CONN_TO_SESSION, operations like CREATE_SESSION, BIND_CONN_TO_SESSION,
DESTROY_SESSION, and DESTROY_CLIENTID will prevent an attacker from DESTROY_SESSION, and DESTROY_CLIENTID will prevent an attacker from
associating a rogue connection with a session, or associating a rogue associating a rogue connection with a session, or associating a rogue
session with a client ID. session with a client ID.
There are at least three scenarios for the SP4_MACH_CRED option: There are at least three scenarios for the SP4_MACH_CRED option:
1. The system administrator configures a unique, permanent per- 1. The system administrator configures a unique, permanent per-
machine credential for one of the mandated GSS mechanisms (e.g., machine credential for one of the mandated GSS mechanisms (e.g.,
skipping to change at page 74, line 16 skipping to change at page 74, line 28
iso.org.dod.internet.private.enterprise.Michael Eisler.nfs.ssv_mech iso.org.dod.internet.private.enterprise.Michael Eisler.nfs.ssv_mech
(1.3.6.1.4.1.28882.1.1). While the SSV mechanism does not define any (1.3.6.1.4.1.28882.1.1). While the SSV mechanism does not define any
initial context tokens, the OID can be used to let servers indicate initial context tokens, the OID can be used to let servers indicate
that the SSV mechanism is acceptable whenever the client sends a that the SSV mechanism is acceptable whenever the client sends a
SECINFO or SECINFO_NO_NAME operation (see Section 2.6). SECINFO or SECINFO_NO_NAME operation (see Section 2.6).
The SSV mechanism defines four subkeys derived from the SSV value. The SSV mechanism defines four subkeys derived from the SSV value.
Each time SET_SSV is invoked, the subkeys are recalculated by the Each time SET_SSV is invoked, the subkeys are recalculated by the
client and server. The calculation of each of the four subkeys client and server. The calculation of each of the four subkeys
depends on each of the four respective ssv_subkey4 enumerated values. depends on each of the four respective ssv_subkey4 enumerated values.
The calculation uses the HMAC [59] algorithm, using the current SSV The calculation uses the HMAC [51] algorithm, using the current SSV
as the key, the one-way hash algorithm as negotiated by EXCHANGE_ID, as the key, the one-way hash algorithm as negotiated by EXCHANGE_ID,
and the input text as represented by the XDR encoded enumeration and the input text as represented by the XDR encoded enumeration
value for that subkey of data type ssv_subkey4. If the length of the value for that subkey of data type ssv_subkey4. If the length of the
output of the HMAC algorithm exceeds the length of key of the output of the HMAC algorithm exceeds the length of key of the
encryption algorithm (which is also negotiated by EXCHANGE_ID), then encryption algorithm (which is also negotiated by EXCHANGE_ID), then
the subkey MUST be truncated from the HMAC output, i.e., if the the subkey MUST be truncated from the HMAC output, i.e., if the
subkey is of N bytes long, then the first N bytes of the HMAC output subkey is of N bytes long, then the first N bytes of the HMAC output
MUST be used for the subkey. The specification of EXCHANGE_ID states MUST be used for the subkey. The specification of EXCHANGE_ID states
that the length of the output of the HMAC algorithm MUST NOT be less that the length of the output of the HMAC algorithm MUST NOT be less
than the length of subkey needed for the encryption algorithm (see than the length of subkey needed for the encryption algorithm (see
skipping to change at page 77, line 24 skipping to change at page 77, line 34
require inputs to be in fixed-sized blocks. The content of sspt_pad require inputs to be in fixed-sized blocks. The content of sspt_pad
is zero filled except for the length. Beware that the XDR encoding is zero filled except for the length. Beware that the XDR encoding
of ssv_seal_plain_tkn4 contains three variable-length arrays, and so of ssv_seal_plain_tkn4 contains three variable-length arrays, and so
each array consumes four bytes for an array length, and each array each array consumes four bytes for an array length, and each array
that follows the length is always padded to a multiple of four bytes that follows the length is always padded to a multiple of four bytes
per the XDR standard. per the XDR standard.
For example, suppose the encryption algorithm uses 16-byte blocks, For example, suppose the encryption algorithm uses 16-byte blocks,
and the sspt_confounder is three bytes long, and the sspt_orig_plain and the sspt_confounder is three bytes long, and the sspt_orig_plain
field is 15 bytes long. The XDR encoding of sspt_confounder uses field is 15 bytes long. The XDR encoding of sspt_confounder uses
eight bytes (4 + 3 + 1 byte pad), the XDR encoding of sspt_ssv_seq eight bytes (4 + 3 + 1-byte pad), the XDR encoding of sspt_ssv_seq
uses four bytes, the XDR encoding of sspt_orig_plain uses 20 bytes (4 uses four bytes, the XDR encoding of sspt_orig_plain uses 20 bytes (4
+ 15 + 1 byte pad), and the smallest XDR encoding of the sspt_pad + 15 + 1-byte pad), and the smallest XDR encoding of the sspt_pad
field is four bytes. This totals 36 bytes. The next multiple of 16 field is four bytes. This totals 36 bytes. The next multiple of 16
is 48; thus, the length field of sspt_pad needs to be set to 12 is 48; thus, the length field of sspt_pad needs to be set to 12
bytes, or a total encoding of 16 bytes. The total number of XDR bytes, or a total encoding of 16 bytes. The total number of XDR
encoded bytes is thus 8 + 4 + 20 + 16 = 48. encoded bytes is thus 8 + 4 + 20 + 16 = 48.
GSS_Wrap() emits a token that is an XDR encoding of a value of data GSS_Wrap() emits a token that is an XDR encoding of a value of data
type ssv_seal_cipher_tkn4. Note that regardless of whether or not type ssv_seal_cipher_tkn4. Note that regardless of whether or not
the caller of GSS_Wrap() requests confidentiality, the token always the caller of GSS_Wrap() requests confidentiality, the token always
has confidentiality. This is because the SSV mechanism is for has confidentiality. This is because the SSV mechanism is for
RPCSEC_GSS, and RPCSEC_GSS never produces GSS_wrap() tokens without RPCSEC_GSS, and RPCSEC_GSS never produces GSS_wrap() tokens without
skipping to change at page 83, line 39 skipping to change at page 84, line 9
1. If the client has other connections to other server network 1. If the client has other connections to other server network
addresses associated with the same session, attempt a COMPOUND addresses associated with the same session, attempt a COMPOUND
with a single operation, SEQUENCE, on each of the other with a single operation, SEQUENCE, on each of the other
connections. connections.
2. If the attempts succeed, the session is still alive, and this is 2. If the attempts succeed, the session is still alive, and this is
a strong indicator that the server's network address has moved. a strong indicator that the server's network address has moved.
The client might send an EXCHANGE_ID on the connection that The client might send an EXCHANGE_ID on the connection that
returned NFS4ERR_BADSESSION to see if there are opportunities for returned NFS4ERR_BADSESSION to see if there are opportunities for
client ID trunking (i.e., the same client ID and so_major are client ID trunking (i.e., the same client ID and so_major value
returned). The client might use DNS to see if the moved network are returned). The client might use DNS to see if the moved
address was replaced with another, so that the performance and network address was replaced with another, so that the
availability benefits of session trunking can continue. performance and availability benefits of session trunking can
continue.
3. If the SEQUENCE requests fail with NFS4ERR_BADSESSION, then the 3. If the SEQUENCE requests fail with NFS4ERR_BADSESSION, then the
session no longer exists on any of the server network addresses session no longer exists on any of the server network addresses
for which the client has connections associated with that session for which the client has connections associated with that session
ID. It is possible the session is still alive and available on ID. It is possible the session is still alive and available on
other network addresses. The client sends an EXCHANGE_ID on all other network addresses. The client sends an EXCHANGE_ID on all
the connections to see if the server owner is still listening on the connections to see if the server owner is still listening on
those network addresses. If the same server owner is returned those network addresses. If the same server owner is returned
but a new client ID is returned, this is a strong indicator of a but a new client ID is returned, this is a strong indicator of a
server restart. If both the same server owner and same client ID server restart. If both the same server owner and same client ID
skipping to change at page 92, line 14 skipping to change at page 92, line 32
3.3.9. netaddr4 3.3.9. netaddr4
struct netaddr4 { struct netaddr4 {
/* see struct rpcb in RFC 1833 */ /* see struct rpcb in RFC 1833 */
string na_r_netid<>; /* network id */ string na_r_netid<>; /* network id */
string na_r_addr<>; /* universal address */ string na_r_addr<>; /* universal address */
}; };
The netaddr4 data type is used to identify network transport The netaddr4 data type is used to identify network transport
endpoints. The r_netid and r_addr fields respectively contain a endpoints. The na_r_netid and na_r_addr fields respectively contain
netid and uaddr. The netid and uaddr concepts are defined in [12]. a netid and uaddr. The netid and uaddr concepts are defined in [12].
The netid and uaddr formats for TCP over IPv4 and TCP over IPv6 are The netid and uaddr formats for TCP over IPv4 and TCP over IPv6 are
defined in [12], specifically Tables 2 and 3 and Sections 5.2.3.3 and defined in [12], specifically Tables 2 and 3 and Sections 5.2.3.3 and
5.2.3.4. 5.2.3.4.
3.3.10. state_owner4 3.3.10. state_owner4
struct state_owner4 { struct state_owner4 {
clientid4 clientid; clientid4 clientid;
opaque owner<NFS4_OPAQUE_LIMIT>; opaque owner<NFS4_OPAQUE_LIMIT>;
}; };
skipping to change at page 93, line 48 skipping to change at page 94, line 20
The layouttype4 data type is 32 bits in length. The range The layouttype4 data type is 32 bits in length. The range
represented by the layout type is split into three parts. Type 0x0 represented by the layout type is split into three parts. Type 0x0
is reserved. Types within the range 0x00000001-0x7FFFFFFF are is reserved. Types within the range 0x00000001-0x7FFFFFFF are
globally unique and are assigned according to the description in globally unique and are assigned according to the description in
Section 22.5; they are maintained by IANA. Types within the range Section 22.5; they are maintained by IANA. Types within the range
0x80000000-0xFFFFFFFF are site specific and for private use only. 0x80000000-0xFFFFFFFF are site specific and for private use only.
The LAYOUT4_NFSV4_1_FILES enumeration specifies that the NFSv4.1 file The LAYOUT4_NFSV4_1_FILES enumeration specifies that the NFSv4.1 file
layout type, as defined in Section 13, is to be used. The layout type, as defined in Section 13, is to be used. The
LAYOUT4_OSD2_OBJECTS enumeration specifies that the object layout, as LAYOUT4_OSD2_OBJECTS enumeration specifies that the object layout, as
defined in [43], is to be used. Similarly, the LAYOUT4_BLOCK_VOLUME defined in [46], is to be used. Similarly, the LAYOUT4_BLOCK_VOLUME
enumeration specifies that the block/volume layout, as defined in enumeration specifies that the block/volume layout, as defined in
[44], is to be used. [47], is to be used.
3.3.14. deviceid4 3.3.14. deviceid4
const NFS4_DEVICEID4_SIZE = 16; const NFS4_DEVICEID4_SIZE = 16;
typedef opaque deviceid4[NFS4_DEVICEID4_SIZE]; typedef opaque deviceid4[NFS4_DEVICEID4_SIZE];
Layout information includes device IDs that specify a storage device Layout information includes device IDs that specify a storage device
through a compact handle. Addressing and type information is through a compact handle. Addressing and type information is
obtained with the GETDEVICEINFO operation. Device IDs are not obtained with the GETDEVICEINFO operation. Device IDs are not
skipping to change at page 98, line 18 skipping to change at page 99, line 10
for a file system object. The contents of the filehandle are opaque for a file system object. The contents of the filehandle are opaque
to the client. Therefore, the server is responsible for translating to the client. Therefore, the server is responsible for translating
the filehandle to an internal representation of the file system the filehandle to an internal representation of the file system
object. object.
4.1. Obtaining the First Filehandle 4.1. Obtaining the First Filehandle
The operations of the NFS protocol are defined in terms of one or The operations of the NFS protocol are defined in terms of one or
more filehandles. Therefore, the client needs a filehandle to more filehandles. Therefore, the client needs a filehandle to
initiate communication with the server. With the NFSv3 protocol (RFC initiate communication with the server. With the NFSv3 protocol (RFC
1813 [34]), there exists an ancillary protocol to obtain this first 1813 [37]), there exists an ancillary protocol to obtain this first
filehandle. The MOUNT protocol, RPC program number 100005, provides filehandle. The MOUNT protocol, RPC program number 100005, provides
the mechanism of translating a string-based file system pathname to a the mechanism of translating a string-based file system pathname to a
filehandle, which can then be used by the NFS protocols. filehandle, which can then be used by the NFS protocols.
The MOUNT protocol has deficiencies in the area of security and use The MOUNT protocol has deficiencies in the area of security and use
via firewalls. This is one reason that the use of the public via firewalls. This is one reason that the use of the public
filehandle was introduced in RFC 2054 [45] and RFC 2055 [46]. With filehandle was introduced in RFC 2054 [48] and RFC 2055 [49]. With
the use of the public filehandle in combination with the LOOKUP the use of the public filehandle in combination with the LOOKUP
operation in the NFSv3 protocol, it has been demonstrated that the operation in the NFSv3 protocol, it has been demonstrated that the
MOUNT protocol is unnecessary for viable interaction between NFS MOUNT protocol is unnecessary for viable interaction between NFS
client and server. client and server.
Therefore, the NFSv4.1 protocol will not use an ancillary protocol Therefore, the NFSv4.1 protocol will not use an ancillary protocol
for translation from string-based pathnames to a filehandle. Two for translation from string-based pathnames to a filehandle. Two
special filehandles will be used as starting points for the NFS special filehandles will be used as starting points for the NFS
client. client.
skipping to change at page 105, line 45 skipping to change at page 106, line 37
Named attributes and the named attribute directory may have their own Named attributes and the named attribute directory may have their own
(non-named) attributes. Each of these objects MUST have all of the (non-named) attributes. Each of these objects MUST have all of the
REQUIRED attributes and may have additional RECOMMENDED attributes. REQUIRED attributes and may have additional RECOMMENDED attributes.
However, the set of attributes for named attributes and the named However, the set of attributes for named attributes and the named
attribute directory need not be, and typically will not be, as large attribute directory need not be, and typically will not be, as large
as that for other objects in that file system. as that for other objects in that file system.
Named attributes and the named attribute directory might be the Named attributes and the named attribute directory might be the
target of delegations (in the case of the named attribute directory, target of delegations (in the case of the named attribute directory,
these will be directory delegations). However, since granting these will be directory delegations). However, since granting of
delegations is at the server's discretion, a server need not support delegations is at the server's discretion, a server need not support
delegations on named attributes or the named attribute directory. delegations on named attributes or the named attribute directory.
It is RECOMMENDED that servers support arbitrary named attributes. A It is RECOMMENDED that servers support arbitrary named attributes. A
client should not depend on the ability to store any named attributes client should not depend on the ability to store any named attributes
in the server's file system. If a server does support named in the server's file system. If a server does support named
attributes, a client that is also able to handle them should be able attributes, a client that is also able to handle them should be able
to copy a file's data and metadata with complete transparency from to copy a file's data and metadata with complete transparency from
one location to another; this would imply that names allowed for one location to another; this would imply that names allowed for
regular directory entries are valid for named attribute names as regular directory entries are valid for named attribute names as
skipping to change at page 108, line 8 skipping to change at page 108, line 48
5.6. REQUIRED Attributes - List and Definition References 5.6. REQUIRED Attributes - List and Definition References
The list of REQUIRED attributes appears in Table 2. The meaning of The list of REQUIRED attributes appears in Table 2. The meaning of
the columns of the table are: the columns of the table are:
o Name: The name of the attribute. o Name: The name of the attribute.
o Id: The number assigned to the attribute. In the event of o Id: The number assigned to the attribute. In the event of
conflicts between the assigned number and [10], the latter is conflicts between the assigned number and [10], the latter is
likely authoritative, but should be resolved with Errata to this likely authoritative, but should be resolved with Errata to this
document and/or [10]. See [47] for the Errata process. document and/or [10]. See [50] for the Errata process.
o Data Type: The XDR data type of the attribute. o Data Type: The XDR data type of the attribute.
o Acc: Access allowed to the attribute. R means read-only (GETATTR o Acc: Access allowed to the attribute. R means read-only (GETATTR
may retrieve, SETATTR may not set). W means write-only (SETATTR may retrieve, SETATTR may not set). W means write-only (SETATTR
may set, GETATTR may not retrieve). R W means read/write (GETATTR may set, GETATTR may not retrieve). R W means read/write (GETATTR
may retrieve, SETATTR may set). may retrieve, SETATTR may set).
o Defined in: The section of this specification that describes the o Defined in: The section of this specification that describes the
attribute. attribute.
skipping to change at page 114, line 17 skipping to change at page 115, line 13
Total file slots on the file system containing this object. Total file slots on the file system containing this object.
5.8.2.11. Attribute 76: fs_charset_cap 5.8.2.11. Attribute 76: fs_charset_cap
Character set capabilities for this file system. See Section 14.4. Character set capabilities for this file system. See Section 14.4.
5.8.2.12. Attribute 24: fs_locations 5.8.2.12. Attribute 24: fs_locations
Locations where this file system may be found. If the server returns Locations where this file system may be found. If the server returns
NFS4ERR_MOVED as an error, this attribute MUST be supported. See NFS4ERR_MOVED as an error, this attribute MUST be supported. See
Section 11.15 for more details. Section 11.16 for more details.
5.8.2.13. Attribute 67: fs_locations_info 5.8.2.13. Attribute 67: fs_locations_info
Full function file system location. See Section 11.16.2 for more Full function file system location. See Section 11.17.2 for more
details. details.
5.8.2.14. Attribute 61: fs_status 5.8.2.14. Attribute 61: fs_status
Generic file system type information. See Section 11.17 for more Generic file system type information. See Section 11.18 for more
details. details.
5.8.2.15. Attribute 25: hidden 5.8.2.15. Attribute 25: hidden
TRUE, if the file is considered hidden with respect to the Windows TRUE, if the file is considered hidden with respect to the Windows
API. API.
5.8.2.16. Attribute 26: homogeneous 5.8.2.16. Attribute 26: homogeneous
TRUE, if this object's file system is homogeneous; i.e., all objects TRUE, if this object's file system is homogeneous; i.e., all objects
skipping to change at page 119, line 27 skipping to change at page 120, line 27
5.8.2.44. Attribute 54: time_modify_set 5.8.2.44. Attribute 54: time_modify_set
Sets the time of last modification to the object. SETATTR use only. Sets the time of last modification to the object. SETATTR use only.
5.9. Interpreting owner and owner_group 5.9. Interpreting owner and owner_group
The RECOMMENDED attributes "owner" and "owner_group" (and also users The RECOMMENDED attributes "owner" and "owner_group" (and also users
and groups within the "acl" attribute) are represented in terms of a and groups within the "acl" attribute) are represented in terms of a
UTF-8 string. To avoid a representation that is tied to a particular UTF-8 string. To avoid a representation that is tied to a particular
underlying implementation at the client or server, the use of the underlying implementation at the client or server, the use of the
UTF-8 string has been chosen. Note that Section 6.1 of RFC 2624 [48] UTF-8 string has been chosen. Note that Section 6.1 of RFC 2624 [52]
provides additional rationale. It is expected that the client and provides additional rationale. It is expected that the client and
server will have their own local representation of owner and server will have their own local representation of owner and
owner_group that is used for local storage or presentation to the end owner_group that is used for local storage or presentation to the end
user. Therefore, it is expected that when these attributes are user. Therefore, it is expected that when these attributes are
transferred between the client and server, the local representation transferred between the client and server, the local representation
is translated to a syntax of the form "user@dns_domain". This will is translated to a syntax of the form "user@dns_domain". This will
allow for a client and server that do not use the same local allow for a client and server that do not use the same local
representation the ability to translate to a common syntax that can representation the ability to translate to a common syntax that can
be interpreted by both. be interpreted by both.
skipping to change at page 157, line 20 skipping to change at page 158, line 20
clients should use strong security mechanisms to access the pseudo clients should use strong security mechanisms to access the pseudo
file system in order to prevent man-in-the-middle attacks. file system in order to prevent man-in-the-middle attacks.
8. State Management 8. State Management
Integrating locking into the NFS protocol necessarily causes it to be Integrating locking into the NFS protocol necessarily causes it to be
stateful. With the inclusion of such features as share reservations, stateful. With the inclusion of such features as share reservations,
file and directory delegations, recallable layouts, and support for file and directory delegations, recallable layouts, and support for
mandatory byte-range locking, the protocol becomes substantially more mandatory byte-range locking, the protocol becomes substantially more
dependent on proper management of state than the traditional dependent on proper management of state than the traditional
combination of NFS and NLM (Network Lock Manager) [49]. These combination of NFS and NLM (Network Lock Manager) [53]. These
features include expanded locking facilities, which provide some features include expanded locking facilities, which provide some
measure of inter-client exclusion, but the state also offers features measure of inter-client exclusion, but the state also offers features
not readily providable using a stateless model. There are three not readily providable using a stateless model. There are three
components to making this state manageable: components to making this state manageable:
o clear division between client and server o clear division between client and server
o ability to reliably detect inconsistency in state between client o ability to reliably detect inconsistency in state between client
and server and server
skipping to change at page 172, line 21 skipping to change at page 173, line 21
will use reclaim-type locking requests (e.g., LOCK operations with will use reclaim-type locking requests (e.g., LOCK operations with
reclaim set to TRUE and OPEN operations with a claim type of reclaim set to TRUE and OPEN operations with a claim type of
CLAIM_PREVIOUS; see Section 9.11) to re-establish its locking state. CLAIM_PREVIOUS; see Section 9.11) to re-establish its locking state.
Once this is done, or if there is no such locking state to reclaim, Once this is done, or if there is no such locking state to reclaim,
the client sends a global RECLAIM_COMPLETE operation, i.e., one with the client sends a global RECLAIM_COMPLETE operation, i.e., one with
the rca_one_fs argument set to FALSE, to indicate that it has the rca_one_fs argument set to FALSE, to indicate that it has
reclaimed all of the locking state that it will reclaim. Once a reclaimed all of the locking state that it will reclaim. Once a
client sends such a RECLAIM_COMPLETE operation, it may attempt non- client sends such a RECLAIM_COMPLETE operation, it may attempt non-
reclaim locking operations, although it might get an NFS4ERR_GRACE reclaim locking operations, although it might get an NFS4ERR_GRACE
status result from each such operation until the period of special status result from each such operation until the period of special
handling is over. See Section 11.10.9 for a discussion of the handling is over. See Section 11.11.9 for a discussion of the
analogous handling lock reclamation in the case of file systems analogous handling lock reclamation in the case of file systems
transitioning from server to server. transitioning from server to server.
During the grace period, the server must reject READ and WRITE During the grace period, the server must reject READ and WRITE
operations and non-reclaim locking requests (i.e., other LOCK and operations and non-reclaim locking requests (i.e., other LOCK and
OPEN operations) with an error of NFS4ERR_GRACE, unless it can OPEN operations) with an error of NFS4ERR_GRACE, unless it can
guarantee that these may be done safely, as described below. guarantee that these may be done safely, as described below.
The grace period may last until all clients that are known to The grace period may last until all clients that are known to
possibly have had locks have done a global RECLAIM_COMPLETE possibly have had locks have done a global RECLAIM_COMPLETE
skipping to change at page 173, line 5 skipping to change at page 174, line 5
opportunity to find out about the server restart, as a result of opportunity to find out about the server restart, as a result of
sending requests on associated sessions with a frequency governed by sending requests on associated sessions with a frequency governed by
the lease time. Note that when a client does not send such requests the lease time. Note that when a client does not send such requests
(or they are sent by the client but not received by the server), it (or they are sent by the client but not received by the server), it
is possible for the grace period to expire before the client finds is possible for the grace period to expire before the client finds
out that the server restart has occurred. out that the server restart has occurred.
Some additional time in order to allow a client to establish a new Some additional time in order to allow a client to establish a new
client ID and session and to effect lock reclaims may be added to the client ID and session and to effect lock reclaims may be added to the
lease time. Note that analogous rules apply to file system-specific lease time. Note that analogous rules apply to file system-specific
grace periods discussed in Section 11.10.9. grace periods discussed in Section 11.11.9.
If the server can reliably determine that granting a non-reclaim If the server can reliably determine that granting a non-reclaim
request will not conflict with reclamation of locks by other clients, request will not conflict with reclamation of locks by other clients,
the NFS4ERR_GRACE error does not have to be returned even within the the NFS4ERR_GRACE error does not have to be returned even within the
grace period, although NFS4ERR_GRACE must always be returned to grace period, although NFS4ERR_GRACE must always be returned to
clients attempting a non-reclaim lock request before doing their own clients attempting a non-reclaim lock request before doing their own
global RECLAIM_COMPLETE. For the server to be able to service READ global RECLAIM_COMPLETE. For the server to be able to service READ
and WRITE operations during the grace period, it must again be able and WRITE operations during the grace period, it must again be able
to guarantee that no possible conflict could arise between a to guarantee that no possible conflict could arise between a
potential reclaim locking request and the READ or WRITE operation. potential reclaim locking request and the READ or WRITE operation.
skipping to change at page 174, line 10 skipping to change at page 175, line 10
requests to be processed during the grace period, it MUST determine requests to be processed during the grace period, it MUST determine
that no lock subsequently reclaimed will be rejected and that no lock that no lock subsequently reclaimed will be rejected and that no lock
subsequently reclaimed would have prevented any I/O operation subsequently reclaimed would have prevented any I/O operation
processed during the grace period. processed during the grace period.
Clients should be prepared for the return of NFS4ERR_GRACE errors for Clients should be prepared for the return of NFS4ERR_GRACE errors for
non-reclaim lock and I/O requests. In this case, the client should non-reclaim lock and I/O requests. In this case, the client should
employ a retry mechanism for the request. A delay (on the order of employ a retry mechanism for the request. A delay (on the order of
several seconds) between retries should be used to avoid overwhelming several seconds) between retries should be used to avoid overwhelming
the server. Further discussion of the general issue is included in the server. Further discussion of the general issue is included in
[50]. The client must account for the server that can perform I/O [54]. The client must account for the server that can perform I/O
and non-reclaim locking requests within the grace period as well as and non-reclaim locking requests within the grace period as well as
those that cannot do so. those that cannot do so.
A reclaim-type locking request outside the server's grace period can A reclaim-type locking request outside the server's grace period can
only succeed if the server can guarantee that no conflicting lock or only succeed if the server can guarantee that no conflicting lock or
I/O request has been granted since restart. I/O request has been granted since restart.
A server may, upon restart, establish a new value for the lease A server may, upon restart, establish a new value for the lease
period. Therefore, clients should, once a new client ID is period. Therefore, clients should, once a new client ID is
established, refetch the lease_time attribute and use it as the basis established, refetch the lease_time attribute and use it as the basis
skipping to change at page 175, line 13 skipping to change at page 176, line 13
Section 2.10.5) and not to control lock reclaim. Section 2.10.5) and not to control lock reclaim.
8.4.2.1.1. Security Considerations for State Reclaim 8.4.2.1.1. Security Considerations for State Reclaim
During the grace period, a client can reclaim state that it believes During the grace period, a client can reclaim state that it believes
or asserts it had before the server restarted. Unless the server or asserts it had before the server restarted. Unless the server
maintained a complete record of all the state the client had, the maintained a complete record of all the state the client had, the
server has little choice but to trust the client. (Of course, if the server has little choice but to trust the client. (Of course, if the
server maintained a complete record, then it would not have to force server maintained a complete record, then it would not have to force
the client to reclaim state after server restart.) While the server the client to reclaim state after server restart.) While the server
has to trust the client to tell the truth, such trust does not have has to trust the client to tell the truth, the negative consequences
any negative consequences for security. The fundamental rule for the for security are limited to enabling denial-of-service attacks in
server when processing reclaim requests is that it MUST NOT grant the situations in which AUTH_SYS is supported. The fundamental rule for
reclaim if an equivalent non-reclaim request would not be granted the server when processing reclaim requests is that it MUST NOT grant
the reclaim if an equivalent non-reclaim request would not be granted
during steady state due to access control or access conflict issues. during steady state due to access control or access conflict issues.
For example, an OPEN request during a reclaim will be refused with For example, an OPEN request during a reclaim will be refused with
NFS4ERR_ACCESS if the principal making the request does not have NFS4ERR_ACCESS if the principal making the request does not have
access to open the file according to the discretionary ACL access to open the file according to the discretionary ACL
(Section 6.2.2) on the file. (Section 6.2.2) on the file.
Nonetheless, it is possible that a client operating in error or Nonetheless, it is possible that a client operating in error or
maliciously could, during reclaim, prevent another client from maliciously could, during reclaim, prevent another client from
reclaiming access to state. For example, an attacker could send an reclaiming access to state. For example, an attacker could send an
OPEN reclaim operation with a deny mode that prevents another client OPEN reclaim operation with a deny mode that prevents another client
skipping to change at page 207, line 33 skipping to change at page 208, line 33
another client. another client.
For the purposes of OPEN delegation, READs and WRITEs done without an For the purposes of OPEN delegation, READs and WRITEs done without an
OPEN are treated as the functional equivalents of a corresponding OPEN are treated as the functional equivalents of a corresponding
type of OPEN. Although a client SHOULD NOT use special stateids when type of OPEN. Although a client SHOULD NOT use special stateids when
an open exists, delegation handling on the server can use the client an open exists, delegation handling on the server can use the client
ID associated with the current session to determine if the operation ID associated with the current session to determine if the operation
has been done by the holder of the delegation (in which case, no has been done by the holder of the delegation (in which case, no
recall is necessary) or by another client (in which case, the recall is necessary) or by another client (in which case, the
delegation must be recalled and I/O not proceed until the delegation delegation must be recalled and I/O not proceed until the delegation
is recalled or revoked). is returned or revoked).
With delegations, a client is able to avoid writing data to the With delegations, a client is able to avoid writing data to the
server when the CLOSE of a file is serviced. The file close system server when the CLOSE of a file is serviced. The file close system
call is the usual point at which the client is notified of a lack of call is the usual point at which the client is notified of a lack of
stable storage for the modified file data generated by the stable storage for the modified file data generated by the
application. At the close, file data is written to the server and, application. At the close, file data is written to the server and,
through normal accounting, the server is able to determine if the through normal accounting, the server is able to determine if the
available file system space for the data has been exceeded (i.e., the available file system space for the data has been exceeded (i.e., the
server returns NFS4ERR_NOSPC or NFS4ERR_DQUOT). This accounting server returns NFS4ERR_NOSPC or NFS4ERR_DQUOT). This accounting
includes quotas. The introduction of delegations requires that an includes quotas. The introduction of delegations requires that an
skipping to change at page 225, line 32 skipping to change at page 226, line 32
changing enough that the pure recall model may not be effective while changing enough that the pure recall model may not be effective while
trying to allow the client to get substantial benefit. In the trying to allow the client to get substantial benefit. In the
absence of notifications, once the delegation is recalled the client absence of notifications, once the delegation is recalled the client
has to refresh its directory cache; this might not be very efficient has to refresh its directory cache; this might not be very efficient
for very large directories. for very large directories.
The delegation is read-only and the client may not make changes to The delegation is read-only and the client may not make changes to
the directory other than by performing NFSv4.1 operations that modify the directory other than by performing NFSv4.1 operations that modify
the directory or the associated file attributes so that the server the directory or the associated file attributes so that the server
has knowledge of these changes. In order to keep the client's has knowledge of these changes. In order to keep the client's
namespace synchronized with the server, the server will notify the namespace synchronized with that of the server, the server will
delegation-holding client (assuming it has requested notifications) notify the delegation-holding client (assuming it has requested
of the changes made as a result of that client's directory-modifying notifications) of the changes made as a result of that client's
operations. This is to avoid any need for that client to send directory-modifying operations. This is to avoid any need for that
subsequent GETATTR or READDIR operations to the server. If a single client to send subsequent GETATTR or READDIR operations to the
client is holding the delegation and that client makes any changes to server. If a single client is holding the delegation and that client
the directory (i.e., the changes are made via operations sent on a makes any changes to the directory (i.e., the changes are made via
session associated with the client ID holding the delegation), the operations sent on a session associated with the client ID holding
delegation will not be recalled. Multiple clients may hold a the delegation), the delegation will not be recalled. Multiple
delegation on the same directory, but if any such client modifies the clients may hold a delegation on the same directory, but if any such
directory, the server MUST recall the delegation from the other client modifies the directory, the server MUST recall the delegation
clients, unless those clients have made provisions to be notified of from the other clients, unless those clients have made provisions to
that sort of modification. be notified of that sort of modification.
Delegations can be recalled by the server at any time. Normally, the Delegations can be recalled by the server at any time. Normally, the
server will recall the delegation when the directory changes in a way server will recall the delegation when the directory changes in a way
that is not covered by the notification, or when the directory that is not covered by the notification, or when the directory
changes and notifications have not been requested. If another client changes and notifications have not been requested. If another client
removes the directory for which a delegation has been granted, the removes the directory for which a delegation has been granted, the
server will recall the delegation. server will recall the delegation.
10.9.3. Attributes in Support of Directory Notifications 10.9.3. Attributes in Support of Directory Notifications
skipping to change at page 227, line 20 skipping to change at page 228, line 20
such multi-server namespaces is OPTIONAL however, and for many such multi-server namespaces is OPTIONAL however, and for many
purposes, single-server namespaces are perfectly acceptable. Use of purposes, single-server namespaces are perfectly acceptable. Use of
multi-server namespaces can provide many advantages, by separating a multi-server namespaces can provide many advantages, by separating a
file system's logical position in a namespace from the (possibly file system's logical position in a namespace from the (possibly
changing) logistical and administrative considerations that result in changing) logistical and administrative considerations that result in
particular file systems being located on particular servers via a particular file systems being located on particular servers via a
single network access paths known in advance or determined using DNS. single network access paths known in advance or determined using DNS.
11.1. Terminology 11.1. Terminology
In this section as a whole (i.e within Section 11), the phrase In this section as a whole (i.e. within all of Section 11), the
"client ID" always refers to the 64-bit shorthand identifier assigned phrase "client ID" always refers to the 64-bit shorthand identifier
by the server (a clientid4) and never to the structure which the assigned by the server (a clientid4) and never to the structure which
client uses to identify itself to the server (called an the client uses to identify itself to the server (called an
nfs_client_id4 or client_owner in NFSv4.0 and NFSv4.1 respectively). nfs_client_id4 or client_owner in NFSv4.0 and NFSv4.1 respectively).
The opaque identifier within those structures is referred to as a The opaque identifier within those structures is referred to as a
"client id string". "client id string".
11.1.1. Terminology Related to Trunking 11.1.1. Terminology Related to Trunking
It is particularly important to clarify the distinction between It is particularly important to clarify the distinction between
trunking detection and trunking discovery. The definitions we trunking detection and trunking discovery. The definitions we
present are applicable to all minor versions of NFSv4, but we will present are applicable to all minor versions of NFSv4, but we will
focus on how these terms apply to NFS version 4.1. focus on how these terms apply to NFS version 4.1.
o Trunking detection refers to ways of deciding whether two specific o Trunking detection refers to ways of deciding whether two specific
network addresses are connected to the same NFSv4 server. The network addresses are connected to the same NFSv4 server. The
means available to make this determination depends on the protocol means available to make this determination depends on the protocol
version, and, in some cases, on the client implementation. version, and, in some cases, on the client implementation.
In the case of NFS version 4.1 and later minor versions, the means In the case of NFS version 4.1 and later minor versions, the means
of trunking detection are as described in this document and are of trunking detection are as described in this document and are
available to every client. Two network addresses connected to the available to every client. Two network addresses connected to the
same server are always server-trunkable but cannot necessarily be same server can always be used together to access a particular
used together to access a single session. server but cannot necessarily be used together to access a single
session. See below for definitions of the terms "server-
trunkable" and "session-trunkable"
o Trunking discovery is a process by which a client using one o Trunking discovery is a process by which a client using one
network address can obtain other addresses that are connected to network address can obtain other addresses that are connected to
the same server. Typically, it builds on a trunking detection the same server. Typically, it builds on a trunking detection
facility by providing one or more methods by which candidate facility by providing one or more methods by which candidate
addresses are made available to the client who can then use addresses are made available to the client who can then use
trunking detection to appropriately filter them. trunking detection to appropriately filter them.
Despite the support for trunking detection there was no Despite the support for trunking detection there was no
description of trunking discovery provided in RFC5661 [62], making description of trunking discovery provided in RFC5661 [65], making
it necessary to provide those means in this document. it necessary to provide those means in this document.
The combination of a server network address and a particular The combination of a server network address and a particular
connection type to be used by a connection is referred to as a connection type to be used by a connection is referred to as a
"server endpoint". Although using different connection types may "server endpoint". Although using different connection types may
result in different ports being used, the use of different ports by result in different ports being used, the use of different ports by
multiple connections to the same network address is not the essence multiple connections to the same network address in such cases is not
of the distinction between the two endpoints used. the essence of the distinction between the two endpoints used. This
is in contrast to the case of port-specific endpoints, in which the
explicit specification of port numbers within network addresses is
used to allow a single server node to support multiple NFS servers.
Two network addresses connected to the same server are said to be Two network addresses connected to the same server are said to be
server-trunkable. Two such addresses support the use of clientid ID server-trunkable. Two such addresses support the use of clientid ID
trunking, as described in Section 2.10.5. trunking, as described in Section 2.10.5.
Two network addresses connected to the same server such that those Two network addresses connected to the same server such that those
addresses can be used to support a single common session are referred addresses can be used to support a single common session are referred
to as session-trunkable. Note that two addresses may be server- to as session-trunkable. Note that two addresses may be server-
trunkable without being session-trunkable and that when two trunkable without being session-trunkable and that when two
connections of different connection types are made to the same connections of different connection types are made to the same
skipping to change at page 228, line 52 skipping to change at page 230, line 11
only a single exported file system. Each such file system is part only a single exported file system. Each such file system is part
of the server's local namespace, and can be considered as a file of the server's local namespace, and can be considered as a file
system instance within a larger multi-server namespace. system instance within a larger multi-server namespace.
o The set of all exported file systems for a given server o The set of all exported file systems for a given server
constitutes that server's local namespace. constitutes that server's local namespace.
o In some cases, a server will have a namespace more extensive than o In some cases, a server will have a namespace more extensive than
its local namespace by using features associated with attributes its local namespace by using features associated with attributes
that provide file system location information. These features, that provide file system location information. These features,
which allow construction of a multi-server namespace are all which allow construction of a multi-server namespace, are all
described in individual sections below and include referrals described in individual sections below and include referrals
(described in Section 11.5.6), migration (described in (described in Section 11.5.6), migration (described in
Section 11.5.5), and replication (described in Section 11.5.4). Section 11.5.5), and replication (described in Section 11.5.4).
o A file system present in a server's pseudo-fs may have multiple o A file system present in a server's pseudo-fs may have multiple
file system instances on different servers associated with it. file system instances on different servers associated with it.
All such instances are considered replicas of one another. All such instances are considered replicas of one another.
Whether such replicas can be used simultaneously is discussed in
Section 11.11.1, while the level of co-ordination between them
(important when switching between them) is discussed in Sections
11.11.2 through 11.11.8 below.
o When a file system is present in a server's pseudo-fs, but there o When a file system is present in a server's pseudo-fs, but there
is no corresponding local file system, it is said to be "absent". is no corresponding local file system, it is said to be "absent".
In such cases, all associated instances will be accessed on other In such cases, all associated instances will be accessed on other
servers. servers.
Regarding terminology relating to attributes used in trunking Regarding terminology relating to attributes used in trunking
discovery and other multi-server namespace features: discovery and other multi-server namespace features:
o File system location attributes include the fs_locations and o File system location attributes include the fs_locations and
fs_locations_info attributes. fs_locations_info attributes.
o File system location entries provide the individual file system o File system location entries provide the individual file system
locations within the file system location attributes. Each such locations within the file system location attributes. Each such
entry specifies a server, in the form of a host name or IP an entry specifies a server, in the form of a host name or an
address, and an fs name, which designates the location of the file address, and an fs name, which designates the location of the file
system within the server's local namespace. A file system system within the server's local namespace. A file system
location entry designates a set of server endpoints to which the location entry designates a set of server endpoints to which the
client may establish connections. There may be multiple endpoints client may establish connections. There may be multiple endpoints
because a host name may map to multiple network addresses and because a host name may map to multiple network addresses and
because multiple connection types may be used to communicate with because multiple connection types may be used to communicate with
a single network address. However, all such endpoints MUST a single network address. However, except where an explicit port
provide a way of connecting to a single server. The exact form of numbers are used to designate a set of server within a single
the location entry varies with the particular file system location server node, all such endpoints MUST designate a way of connecting
attribute used, as described in Section 11.2. to a single server. The exact form of the location entry varies
with the particular file system location attribute used, as
described in Section 11.2.
The network addresses used in file system location entries
typically appear without port number indications and are used to
designate a server at one of the standard ports for NFS access,
e.g., 2049 for TCP, or 20049 for use with RPC-over-RDMA. Port
numbers may be used in file system location entries to designate
servers (typically user-level ones) accessed using other port
numbers. In the case where network addresses indicate trunking
relationships, use of an explicit port number is inappropriate
since trunking is a relationship between network addresses. See
Section 11.5.2 for details.
o File system location elements are derived from location entries o File system location elements are derived from location entries
and each describes a particular network access path, consisting of and each describes a particular network access path, consisting of
a network address and a location within the server's local a network address and a location within the server's local
namespace. Such location elements need not appear within a file namespace. Such location elements need not appear within a file
system location attribute, but the existence of each location system location attribute, but the existence of each location
element derives from a corresponding location entry. When a element derives from a corresponding location entry. When a
location entry specifies an IP address there is only a single location entry specifies an IP address there is only a single
corresponding location element. File system location entries that corresponding location element. File system location entries that
contain a host name are resolved using DNS, and may result in one contain a host name are resolved using DNS, and may result in one
or more location elements. All location elements consist of a or more location elements. All location elements consist of a
location address which is the IP address of an interface to a location address which includes the IP address of an interface to
server and an fs name which is the location of the file system a server and an fs name which is the location of the file system
within the server's local namespace. The fs name can be empty if within the server's local namespace. The fs name can be empty if
the server has no pseudo-fs and only a single exported file system the server has no pseudo-fs and only a single exported file system
at the root filehandle. at the root filehandle.
o Two file system location elements are said to be server-trunkable o Two file system location elements are said to be server-trunkable
if they specify the same fs name and the location addresses are if they specify the same fs name and the location addresses are
such that the location addresses are server-trunkable. When the such that the location addresses are server-trunkable. When the
corresponding network paths are used, the client will always be corresponding network paths are used, the client will always be
able to use client ID trunking, but will only be able to use able to use client ID trunking, but will only be able to use
session trunking if the paths are also session-trunkable. session trunking if the paths are also session-trunkable.
o Two file system location elements are said to be session-trunkable o Two file system location elements are said to be session-trunkable
if they specify the same fs name and the location addresses are if they specify the same fs name and the location addresses are
such that the location addresses are session-trunkable. When the such that the location addresses are session-trunkable. When the
corresponding network paths are used, the client will be able to corresponding network paths are used, the client will be able to
able to use either client ID trunking or session trunking. able to use either client ID trunking or session trunking.
Discussion of the term "replica" is complicated by the fact that the Discussion of the term "replica" is complicated by the fact that the
term was used in RFC5661 [62], with a meaning different from that in term was used in RFC5661 [65], with a meaning different from that in
this document. In short, in [62] each replica is identified by a this document. In short, in [65] each replica is identified by a
single network access path while, in the current document a set of single network access path while, in the current document a set of
network access paths which have server-trunkable network addresses network access paths which have server-trunkable network addresses
and the same root-relative file system pathname is considered to be a and the same root-relative file system pathname is considered to be a
single replica with multiple network access paths. single replica with multiple network access paths.
Each set of server-trunkable location elements defines a set of Each set of server-trunkable location elements defines a set of
available network access paths to a particular file system. When available network access paths to a particular file system. When
there are multiple such file systems, each of which contains the same there are multiple such file systems, each of which contains the same
data, these file systems are considered replicas of one another. data, these file systems are considered replicas of one another.
Logically, such replication is symmetric, since the fs currently in Logically, such replication is symmetric, since the fs currently in
skipping to change at page 231, line 25 skipping to change at page 232, line 50
Within the fs_locations_info attribute, each fs_locations_server4 Within the fs_locations_info attribute, each fs_locations_server4
entry corresponds to a file system location entry with the fls_server entry corresponds to a file system location entry with the fls_server
field designating the server, with the location pathname within the field designating the server, with the location pathname within the
server's pseudo-fs given by the fl_rootpath field of the encompassing server's pseudo-fs given by the fl_rootpath field of the encompassing
fs_locations_item4. fs_locations_item4.
The fs_locations attribute defined in NFSv4.0 is also a part of The fs_locations attribute defined in NFSv4.0 is also a part of
NFSv4.1. This attribute only allows specification of the file system NFSv4.1. This attribute only allows specification of the file system
locations where the data corresponding to a given file system may be locations where the data corresponding to a given file system may be
found. Servers should make this attribute available whenever found. Servers SHOULD make this attribute available whenever
fs_locations_info is supported, but client use of fs_locations_info fs_locations_info is supported, but client use of fs_locations_info
is preferable, as it provides more information. is preferable, as it provides more information.
Within the fs_location attribute, each fs_location4 contains a file Within the fs_location attribute, each fs_location4 contains a file
system location entry with the server field designating the server system location entry with the server field designating the server
and the rootpath field giving the location pathname within the and the rootpath field giving the location pathname within the
server's pseudo-fs. server's pseudo-fs.
11.3. File System Presence or Absence 11.3. File System Presence or Absence
skipping to change at page 235, line 29 skipping to change at page 237, line 5
When a file system is present and becomes absent, clients can be When a file system is present and becomes absent, clients can be
given the opportunity to have continued access to their data, using a given the opportunity to have continued access to their data, using a
different replica. In this case, a continued attempt to use the data different replica. In this case, a continued attempt to use the data
in the now-absent file system will result in an NFS4ERR_MOVED error in the now-absent file system will result in an NFS4ERR_MOVED error
and, at that point, the successor replica or set of possible replica and, at that point, the successor replica or set of possible replica
choices can be fetched and used to continue access. Transfer of choices can be fetched and used to continue access. Transfer of
access to the new replica location is referred to as "migration", and access to the new replica location is referred to as "migration", and
is discussed in Section 11.5.4 below. is discussed in Section 11.5.4 below.
Where a file system had been absent, specification of file system Where a file system is currently absent, specification of file system
location provides a means by which file systems located on one server location provides a means by which file systems located on one server
can be associated with a namespace defined by another server, thus can be associated with a namespace defined by another server, thus
allowing a general multi-server namespace facility. A designation of allowing a general multi-server namespace facility. A designation of
such a remote instance, in place of a file system never previously such a remote instance, in place of a file system not previously
present, is called a "pure referral" and is discussed in present, is called a "pure referral" and is discussed in
Section 11.5.6 below. Section 11.5.6 below.
Because client support for attributes related to file system location Because client support for attributes related to file system location
is OPTIONAL, a server may choose to take action to hide migration and is OPTIONAL, a server may choose to take action to hide migration and
referral events from such clients, by acting as a proxy, for example. referral events from such clients, by acting as a proxy, for example.
The server can determine the presence of client support from the The server can determine the presence of client support from the
arguments of the EXCHANGE_ID operation (see Section 18.35.3). arguments of the EXCHANGE_ID operation (see Section 18.35.3).
11.5.1. Combining Multiple Uses in a Single Attribute 11.5.1. Combining Multiple Uses in a Single Attribute
skipping to change at page 236, line 45 skipping to change at page 238, line 21
o When the name of the server is known to the client, it may use DNS o When the name of the server is known to the client, it may use DNS
to obtain a set of network addresses to use in accessing the to obtain a set of network addresses to use in accessing the
server. server.
o The client may fetch the file system location attribute for the o The client may fetch the file system location attribute for the
file system. This will provide either the name of the server file system. This will provide either the name of the server
(which can be turned into a set of network addresses using DNS), (which can be turned into a set of network addresses using DNS),
or a set of server-trunkable location entries. Using the latter or a set of server-trunkable location entries. Using the latter
alternative, the server can provide addresses it regards as alternative, the server can provide addresses it regards as
desirable to use to access the file system in question. desirable to use to access the file system in question. Although
these entries can contain port numbers, these port numbers are not
used in determining trunking relationships. Once the candidate
addresses have been determined and EXCHANGE_ID done to the proper
server, only the value of the so_major field returned by the
servers in question determines whether a trunking relationship
actually exists.
It should be noted that the client, when it fetches a location It should be noted that the client, when it fetches a location
attribute for a file system, may encounter multiple entries for a attribute for a file system, may encounter multiple entries for a
number of reasons, so that, when determining trunking information, it number of reasons, so that, when determining trunking information, it
may have to bypass addresses not trunkable with one already known. may have to bypass addresses not trunkable with one already known.
The server can provide location entries that include either names or The server can provide location entries that include either names or
network addresses. It might use the latter form because of DNS- network addresses. It might use the latter form because of DNS-
related security concerns or because the set of addresses to be used related security concerns or because the set of addresses to be used
might require active management by the server. might require active management by the server.
Locations entries used to discover candidate addresses for use in Location entries used to discover candidate addresses for use in
trunking are subject to change, as discussed in Section 11.5.7 below. trunking are subject to change, as discussed in Section 11.5.7 below.
The client may respond to such changes by using additional addresses The client may respond to such changes by using additional addresses
once they are verified or by ceasing to use existing ones. The once they are verified or by ceasing to use existing ones. The
server can force the client to cease using an address by returning server can force the client to cease using an address by returning
NFS4ERR_MOVED when that address is used to access a file system. NFS4ERR_MOVED when that address is used to access a file system.
This allows a transfer of client access which is similar to This allows a transfer of client access which is similar to
migration, although the same file system instance is accessed migration, although the same file system instance is accessed
throughout. throughout.
11.5.3. File System Location Attributes and Connection Type Selection 11.5.3. File System Location Attributes and Connection Type Selection
skipping to change at page 237, line 51 skipping to change at page 239, line 36
include support for RDMA operation. This flag makes it convenient include support for RDMA operation. This flag makes it convenient
for a client wishing to use RDMA. When this flag is set, it can for a client wishing to use RDMA. When this flag is set, it can
establish a TCP connection and then convert that connection to use establish a TCP connection and then convert that connection to use
RDMA by using the step-up facility. RDMA by using the step-up facility.
Irrespective of the particular attribute used, when there is no Irrespective of the particular attribute used, when there is no
indication that a step-up operation can be performed, a client indication that a step-up operation can be performed, a client
supporting RDMA operation can establish a new RDMA connection and it supporting RDMA operation can establish a new RDMA connection and it
can be bound to the session already established by the TCP can be bound to the session already established by the TCP
connection, allowing the TCP connection to be dropped and the session connection, allowing the TCP connection to be dropped and the session
converted to further use in RDMA node. converted to further use in RDMA mode, if the server supports that.
11.5.4. File System Replication 11.5.4. File System Replication
The fs_locations and fs_locations_info attributes provide alternative The fs_locations and fs_locations_info attributes provide alternative
file system locations, to be used to access data in place of or in file system locations, to be used to access data in place of or in
addition to the current file system instance. On first access to a addition to the current file system instance. On first access to a
file system, the client should obtain the set of alternate locations file system, the client should obtain the set of alternate locations
by interrogating the fs_locations or fs_locations_info attribute, by interrogating the fs_locations or fs_locations_info attribute,
with the latter being preferred. with the latter being preferred.
In the event that the occurrence of server failures, communications In the event that the occurrence of server failures, communications
problems, or other difficulties make continued access to the current problems, or other difficulties make continued access to the current
file system impossible or otherwise impractical, the client can use file system impossible or otherwise impractical, the client can use
the alternate locations as a way to get continued access to its data. the alternate locations as a way to get continued access to its data.
The alternate locations may be physical replicas of the (typically The alternate locations may be physical replicas of the (typically
read-only) file system data, or they may provide for the use of read-only) file system data supplemented by possible asynchronous
various forms of server clustering in which multiple servers provide propagation of updates. Alternatively, they may provide for the use
alternate ways of accessing the same physical file system. How these of various forms of server clustering in which multiple servers
different modes of file system transition are represented within the provide alternate ways of accessing the same physical file system.
fs_locations and fs_locations_info attributes and how the client How the difference between replicas affects file system transitions
deals with file system transition issues will be discussed in detail can be represented within the fs_locations and fs_locations_info
below. attributes and how the client deals with file system transition
issues will be discussed in detail in later sections.
Although the location attributes provide some information about the
nature of the inter-replica transition, many aspects of the semantics
of possible asynchronous updates are not currently described by the
protocol, making it necessary that clients using replication to
switch among replicas undergoing change familiarize themselves with
the semantics of the update approach used. Because of this lack of
specificity, many applications may find use of migration more
appropriate, since, in that case, the server, when effecting the
transition, has established a point in time such that all updates
made before that can propagated to the new replica as part of the
migration event.
11.5.4.1. File System Trunking Presented as Replication
In some situations, a file system location entry may indicate a file
system access path to be used as an alternate location, where
trunking, rather than replication, is to be used. The situations in
which this is appropriate are limited to those in which both of the
following are true.
o The two file system locations (i.e., the one on which the location
attribute is obtained and the one specified in the file system
location entry) designate the same locations within their
respective single-server namespaces.
o The two server network addresses (i.e., the one being used to
obtain the location attribute and the one specified in the file
system location entry) designate the same server (as indicated by
the same value of the so_major_id field of the eir_server_owner
field returned in response to EXCHANGE_ID).
When these conditions hold, operations using both access paths are
generally trunked, although, when the attribute fs_locations_info is
used, trunking may be disallowed:
o When the fs_locations_info attribute shows the two entries as not
having the same simultaneous-use class, trunking is inhibited and
the two access paths cannot be used together.
In this case the two paths can be used serially with no transition
activity required on the part of the client. In this case, any
transition between access paths is transparent, and the client, in
transferring access from one to the other, is acting as it would
in the event that communication is interrupted, with a new
connection and possibly a new session being established to
continue access to the same file system.
o Note that for two such location entries, any information within
the fs_locations_info attribute that indicates the need for
special transition activity, i.e., the appearance of the two file
system location entries with different handle, fileid, write-
verifier, change, and readdir classes, indicates a serious
problem. The client, if it allows transition to the file system
instance at all, must not treat any transition as a transparent
one. The server SHOULD NOT indicate that these two entries (for
the same file system on the same server) belong to different
handle, fileid, write-verifier, change, and readdir classes,
whether or not the two entries are shown belonging to the same
simultaneous-use class.
These situations were recognized by [65], even though that document
made no explicit mention of trunking.
o It treated the situation that we describe as trunking as one of
simultaneous use of two distinct file system instances, even
though, in the explanatory framework now used to describe the
situation, the case is one in which a single file system is
accessed by two different trunked addresses.
o It treated the situation in which two paths are to be used
serially as a special sort of "transparent transition". however,
in the descriptive framework now used to categorize transition
situations, this is considered a case of a "network endpoint
transition" (see Section 11.9).
11.5.5. File System Migration 11.5.5. File System Migration
When a file system is present and becomes absent, the NFSv4.1 When a file system is present and becomes inaccessible using the
protocol provides a means by which clients can be given the current access path, the NFSv4.1 protocol provides a means by which
opportunity to have continued access to their data, using a different clients can be given the opportunity to have continued access to
replica. The location of this replica is specified by a file system their data. This may involve use of a different access path to the
location attribute. The ensuing migration of access to another existing replica or by providing a path to a different replica. The
replica includes the ability to retain locks across the transition, new access path or the location of the new replica is specified by a
either by using lock reclaim or by taking advantage of Transparent file system location attribute. The ensuing migration of access
State Migration. includes the ability to retain locks across the transition.
Depending on circumstances, this can involve:
o The continued use of the existing clientid when accessing the
current replica using a new access path.
o Use of lock reclaim, taking advantage of a per-fs grace period.
o Use of Transparent State Migration.
Typically, a client will be accessing the file system in question, Typically, a client will be accessing the file system in question,
get an NFS4ERR_MOVED error, and then use a file system location get an NFS4ERR_MOVED error, and then use a file system location
attribute to determine the new location of the data. When attribute to determine the new access path for the data. When
fs_locations_info is used, additional information will be available fs_locations_info is used, additional information will be available
that will define the nature of the client's handling of the that will define the nature of the client's handling of the
transition to a new server. transition to a new server.
Such migration can be helpful in providing load balancing or general In most instances, servers will choose to migrate all clients using a
resource reallocation. The protocol does not specify how the file particular file system to a successor replica at the same time to
system will be moved between servers. It is anticipated that a avoid cases in which different clients are updating different
number of different server-to-server transfer mechanisms might be replicas. However migration of individual client can be helpful in
used with the choice left to the server implementer. The NFSv4.1 providing load balancing, as long as the replicas in question are
protocol specifies the method used to communicate the migration event such that they represent the same data as described in
between client and server. Section 11.11.8.
o In the case in which there is no transition between replicas
(i.e., only a change in access path), there are no special
difficulties in using of this mechanism to effect load balancing.
o In the case in which the two replicas are sufficiently co-
ordinated as to allow coherent simultaneous access to both by a
single client, there is, in general, no obstacle to use of
migration of particular clients to effect load balancing.
Generally, such simultaneous use involves co-operation between
servers to ensure that locks granted on two co-ordinated replicas
cannot conflict and can remain effective when transferred to a
common replica.
o In the case in which a large set of clients are accessing a file
system in a read-only fashion, in can be helpful to migrate all
clients with writable access simultaneously, while using load
balancing on the set of read-only copies, as long as the rules
appearing in Section 11.11.8, designed to prevent data reversion
are adhered to.
In other cases, the client might not have sufficient guarantees of
data similarity/coherence to function properly (e.g. the data in the
two replicas is similar but not identical), and the possibility that
different clients are updating different replicas can exacerbate the
difficulties, making use of load balancing in such situations a
perilous enterprise.
The protocol does not specify how the file system will be moved
between servers or how updates to multiple replicas will be co-
ordinated. It is anticipated that a number of different server-to-
server co-ordination mechanisms might be used with the choice left to
the server implementer. The NFSv4.1 protocol specifies the method
used to communicate the migration event between client and server.
The new location may be, in the case of various forms of server The new location may be, in the case of various forms of server
clustering, another server providing access to the same physical file clustering, another server providing access to the same physical file
system. The client's responsibilities in dealing with this system. The client's responsibilities in dealing with this
transition will depend on whether migration has occurred and the transition will depend on whether a switch between replicas has
means the server has chosen to provide continuity of locking state. occurred and the means the server has chosen to provide continuity of
These issues will be discussed in detail below. locking state. These issues will be discussed in detail below.
Although a single successor location is typical, multiple locations Although a single successor location is typical, multiple locations
may be provided. When multiple locations are provided, the client may be provided. When multiple locations are provided, the client
will typically use the first one provided. If that is inaccessible will typically use the first one provided. If that is inaccessible
for some reason, later ones can be used. In such cases the client for some reason, later ones can be used. In such cases the client
might consider that the transition to the new replica as a migration might consider the transition to the new replica to be a migration
event, even though some of the servers involved might not be aware of event, even though some of the servers involved might not be aware of
the use of the server which was inaccessible. In such a case, a the use of the server which was inaccessible. In such a case, a
client might lose access to locking state as a result of the access client might lose access to locking state as a result of the access
transfer. transfer.
When an alternate location is designated as the target for migration, When an alternate location is designated as the target for migration,
it must designate the same data (with metadata being the same to the it must designate the same data (with metadata being the same to the
degree indicated by the fs_locations_info attribute). Where file degree indicated by the fs_locations_info attribute). Where file
systems are writable, a change made on the original file system must systems are writable, a change made on the original file system must
be visible on all migration targets. Where a file system is not be visible on all migration targets. Where a file system is not
skipping to change at page 239, line 47 skipping to change at page 244, line 5
located on one server with a file system located on another server. located on one server with a file system located on another server.
When this includes the use of pure referrals, servers are provided a When this includes the use of pure referrals, servers are provided a
way of placing a file system in a location within the namespace way of placing a file system in a location within the namespace
essentially without respect to its physical location on a particular essentially without respect to its physical location on a particular
server. This allows a single server or a set of servers to present a server. This allows a single server or a set of servers to present a
multi-server namespace that encompasses file systems located on a multi-server namespace that encompasses file systems located on a
wider range of servers. Some likely uses of this facility include wider range of servers. Some likely uses of this facility include
establishment of site-wide or organization-wide namespaces, with the establishment of site-wide or organization-wide namespaces, with the
eventual possibility of combining such together into a truly global eventual possibility of combining such together into a truly global
namespace, such as the one provided by AFS (the Andrew File System) namespace, such as the one provided by AFS (the Andrew File System)
[61]. [64].
Referrals occur when a client determines, upon first referencing a Referrals occur when a client determines, upon first referencing a
position in the current namespace, that it is part of a new file position in the current namespace, that it is part of a new file
system and that the file system is absent. When this occurs, system and that the file system is absent. When this occurs,
typically upon receiving the error NFS4ERR_MOVED, the actual location typically upon receiving the error NFS4ERR_MOVED, the actual location
or locations of the file system can be determined by fetching a or locations of the file system can be determined by fetching a
locations attribute. locations attribute.
The file system location attribute may designate a single file system The file system location attribute may designate a single file system
location or multiple file system locations, to be selected based on location or multiple file system locations, to be selected based on
skipping to change at page 241, line 5 skipping to change at page 245, line 13
separate servers) for each separately administered portion of the separate servers) for each separately administered portion of the
namespace. The top-level referral file system or any segment may use namespace. The top-level referral file system or any segment may use
replicated referral file systems for higher availability. replicated referral file systems for higher availability.
Generally, multi-server namespaces are for the most part uniform, in Generally, multi-server namespaces are for the most part uniform, in
that the same data made available to one client at a given location that the same data made available to one client at a given location
in the namespace is made available to all clients at that namespace in the namespace is made available to all clients at that namespace
location. However, there are facilities provided that allow location. However, there are facilities provided that allow
different clients to be directed to different sets of data, for different clients to be directed to different sets of data, for
reasons such as enabling adaptation to such client characteristics as reasons such as enabling adaptation to such client characteristics as
CPU architecture. These facilities are described in Section 11.16.3. CPU architecture. These facilities are described in Section 11.17.3.
Note that it is possible, when providing a uniform namespace, to Note that it is possible, when providing a uniform namespace, to
provide diffeent location entries to diffeent clients, in order to provide different location entries to different clients, in order to
provide each client with a copy of the data physically closest to it, provide each client with a copy of the data physically closest to it,
or otherwise optimize access (e.g. provide load balancing). or otherwise optimize access (e.g. provide load balancing).
11.5.7. Changes in a File System Location Attribute 11.5.7. Changes in a File System Location Attribute
Although clients will typically fetch a file system location Although clients will typically fetch a file system location
attribute when first accessing a file system and when NFS4ERR_MOVED attribute when first accessing a file system and when NFS4ERR_MOVED
is returned, a client can choose to fetch the attribute periodically, is returned, a client can choose to fetch the attribute periodically,
in which case the value fetched may change over time. in which case the value fetched may change over time.
For clients not prepared to access multiple replicas simultaneously For clients not prepared to access multiple replicas simultaneously
(see Section 11.10.1), the handling of the various cases of location (see Section 11.11.1), the handling of the various cases of location
change are as follows: change are as follows:
o Changes in the list of replicas or in the network addresses o Changes in the list of replicas or in the network addresses
associated with replicas do not require immediate action. The associated with replicas do not require immediate action. The
client will typically update its list of replicas to reflect the client will typically update its list of replicas to reflect the
new information. new information.
o Additions to the list of network addresses for the current file o Additions to the list of network addresses for the current file
system instance need not be acted on promptly. However, to system instance need not be acted on promptly. However, to
prepare for the case in which a migration event occurs prepare for the case in which a migration event occurs
subsequently, the client can choose to take note of the new subsequently, the client can choose to take note of the new
address and then use it whenever it needs to switch access to a address and then use it whenever it needs to switch access to a
new replica. new replica.
o Deletions from the list of network addresses for the current file o Deletions from the list of network addresses for the current file
system instance need not be acted on immediately, although the system instance do not need to be acted on immediately by ceasing
client might need to be prepared for a shift in access whenever use of existing access paths although new connections are not to
the server indicates that a network access path is not usable to be established on addresses that have been deleted. However,
access the current file system, by returning NFS4ERR_MOVED. clients can choose to act on such deletions by making preparations
for an eventual shift in access which would become unavoidable as
soon as the server indicates that a particular network access path
is not usable to access the current file system, by returning
NFS4ERR_MOVED.
For clients that are prepared to access several replicas For clients that are prepared to access several replicas
simultaneously, the following additional cases need to be addressed. simultaneously, the following additional cases need to be addressed.
As in the cases discussed above, changes in the set of replicas need As in the cases discussed above, changes in the set of replicas need
not be acted upon promptly, although the client has the option of not be acted upon promptly, although the client has the option of
adjusting its access even in the absence of difficulties that would adjusting its access even in the absence of difficulties that would
lead to a new replica to be selected. lead to a new replica to be selected.
o When a new replica is added which may be accessed simultaneously o When a new replica is added which may be accessed simultaneously
with one currently in use, the client is free to use the new with one currently in use, the client is free to use the new
replica immediately. replica immediately.
o When a replica currently in use is deleted from the list, the o When a replica currently in use is deleted from the list, the
client need not cease using it immediately. However, since the client need not cease using it immediately. However, since the
server may subsequently force such use to cease (by returning server may subsequently force such use to cease (by returning
NFS4ERR_MOVED), clients might decide to limit the need for later NFS4ERR_MOVED), clients might decide to limit the need for later
state transfer. For example, new opens might be done on other state transfer. For example, new opens might be done on other
replicas, rather than on one not present in the list. replicas, rather than on one not present in the list.
11.6. Users and Groups in a Multi-server Namespace 11.6. Trunking without File System Location Information
In situations in which a file system is accessed using two server-
trunkable addresses (as indicated by the same value of the
so_major_id field of the eir_server_owner field returned in response
to EXCHANGE_ID), trunked access is allowed even though there might
not be any location entries specifically indicating the use of
trunking for that file system.
This situation was recognized by [65], even though that document made
no explicit mention of trunking and treated the situation as one of
simultaneous use of two distinct file system instances, even though,
in the explanatory framework now used to describe the situation, the
case is one in which a single file system is accessed by two
different trunked addresses.
11.7. Users and Groups in a Multi-server Namespace
As in the case of a single-server environment (see Section 5.9, when As in the case of a single-server environment (see Section 5.9, when
an owner or group name of the form "id@domain" is assigned to a file, an owner or group name of the form "id@domain" is assigned to a file,
there is an implcit promise to return that same string when the there is an implicit promise to return that same string when the
corresponding attribute is interrogated subsequently. In the case of corresponding attribute is interrogated subsequently. In the case of
a multi-server namespace, that same promise applies even if server a multi-server namespace, that same promise applies even if server
boundaries have been crossed. Similarly, when the owner attribute of boundaries have been crossed. Similarly, when the owner attribute of
a file is derived from the securiy principal which created the file, a file is derived from the security principal which created the file,
that attribute should have the same value even if the interrogation that attribute should have the same value even if the interrogation
occurs on a different server from the file creation. occurs on a different server from the file creation.
Similarly, the set of security principals recognized by all the Similarly, the set of security principals recognized by all the
participating servers needs to be the same, with each such principal participating servers needs to be the same, with each such principal
having the same credentials, regardless of the particular server having the same credentials, regardless of the particular server
being accessed. being accessed.
In order to meet these requirements, those setting up multi-server In order to meet these requirements, those setting up multi-server
namespaces will need to limit the servers included so that: namespaces will need to limit the servers included so that:
o In all cases in which more than a single domain is supported, the o In all cases in which more than a single domain is supported, the
requirements stated in RFC8000 [30] are to be respected. requirements stated in RFC8000 [31] are to be respected.
o All servers support a common set of domains which includes all of o All servers support a common set of domains which includes all of
the domains clients use and expect to see returned as the domain the domains clients use and expect to see returned as the domain
portion of an owner or group in the form "id@domain". Note that portion of an owner or group in the form "id@domain". Note that
although this set most ofen consists of a single domain, it is although this set most often consists of a single domain, it is
possible for mutiple domains to be supported. possible for multiple domains to be supported.
o All servers, for each domain that they support, accept the same o All servers, for each domain that they support, accept the same
set of user and group ids as valid. set of user and group ids as valid.
o All servers recognize the same set of security principals, and o All servers recognize the same set of security principals. For
each principal, the same credential are required, independent of each principal, the same credential is required, independent of
the server being accessed. In addition, the group membership for the server being accessed. In addition, the group membership for
each such prinicipal is to be the same, independent of the server each such principal is to be the same, independent of the server
accessed. accessed.
Note that there is no requirment that the users corresponding to Note that there is no requirement in general that the users
particular security principals have the same local representation on corresponding to particular security principals have the same local
each server, even though it is most often the case that this is so. representation on each server, even though it is most often the case
that this is so.
When AUTH_SYS is used, with or without the use of stringified owners When AUTH_SYS is used, the following additional requirements must be
and groups, the following additional requirements must be met: met:
o Only a single NFSv4 domain can be supported. o Only a single NFSv4 domain can be supported through use of
AUTH_SYS.
o The "local" representation of all owners and groups must be the o The "local" representation of all owners and groups must be the
same on all servers. The word "local" is used here since that is same on all servers. The word "local" is used here since that is
the way that numeric user and group ids are described in the way that numeric user and group ids are described in
Section 5.9. However, when AUTH_SYS or stringified owners or Section 5.9. However, when AUTH_SYS or stringified numeric owners
group are used, these identifiers are not truly local, since they or groups are used, these identifiers are not truly local, since
are known tothe clients as well as the server. they are known to the clients as well as the server.
11.7. Additional Client-Side Considerations Similarly, when stringified numeric user and group ids are used, the
"local" representation of all owners and groups must be the same on
all servers, even when AUTH_SYS is not used.
11.8. Additional Client-Side Considerations
When clients make use of servers that implement referrals, When clients make use of servers that implement referrals,
replication, and migration, care should be taken that a user who replication, and migration, care should be taken that a user who
mounts a given file system that includes a referral or a relocated mounts a given file system that includes a referral or a relocated
file system continues to see a coherent picture of that user-side file system continues to see a coherent picture of that user-side
file system despite the fact that it contains a number of server-side file system despite the fact that it contains a number of server-side
file systems that may be on different servers. file systems that may be on different servers.
One important issue is upward navigation from the root of a server- One important issue is upward navigation from the root of a server-
side file system to its parent (specified as ".." in UNIX), in the side file system to its parent (specified as ".." in UNIX), in the
skipping to change at page 244, line 5 skipping to change at page 248, line 41
change. It is expected that clients will cache information related change. It is expected that clients will cache information related
to traversing referrals so that future client-side requests are to traversing referrals so that future client-side requests are
resolved locally without server communication. This is usually resolved locally without server communication. This is usually
rooted in client-side name look up caching. Clients should rooted in client-side name look up caching. Clients should
periodically purge this data for referral points in order to detect periodically purge this data for referral points in order to detect
changes in location information. When the change_policy attribute changes in location information. When the change_policy attribute
changes for directories that hold referral entries or for the changes for directories that hold referral entries or for the
referral entries themselves, clients should consider any associated referral entries themselves, clients should consider any associated
cached referral information to be out of date. cached referral information to be out of date.
11.8. Overview of File Access Transitions 11.9. Overview of File Access Transitions
File access transitions are of two types: File access transitions are of two types:
o Those that involve a transition from accessing the current replica o Those that involve a transition from accessing the current replica
to another one in connection with either replication or migration. to another one in connection with either replication or migration.
How these are dealt with is discussed in Section 11.10. How these are dealt with is discussed in Section 11.11.
o Those in which access to the current file system instance is o Those in which access to the current file system instance is
retained, while the network path used to access that instance is retained, while the network path used to access that instance is
changed. This case is discussed in Section 11.9. changed. This case is discussed in Section 11.10.
11.9. Effecting Network Endpoint Transitions 11.10. Effecting Network Endpoint Transitions
The endpoints used to access a particular file system instance may The endpoints used to access a particular file system instance may
change in a number of ways, as listed below. In each of these cases, change in a number of ways, as listed below. In each of these cases,
the same fsid, filehandles, stateids, client IDs and sessions are the same fsid, filehandles, stateids, client IDs and are used to
used to continue access, with a continuity of lock state. continue access, with a continuity of lock state. In many cases, the
same sessions can also be used.
o When use of a particular address is to cease and there is also one The appropriate action depends on the set of replacement addresses
currently in use which is server-trunkable with it, requests that (i.e. server endpoints which are server-trunkable with one previously
would have been issued on the address whose use is to be being used) which are available for use.
discontinued can be issued on the remaining address(es). When an
address is not a session-trunkable one, the request might need to o When use of a particular address is to cease and there is also
be modified to reflect the fact that a different session will be another one currently in use which is server-trunkable with it,
used. requests that would have been issued on the address whose use is
to be discontinued can be issued on the remaining address(es).
When an address is server-trunkable but not session-trunkable with
the address whose use is to be discontinued, the request might
need to be modified to reflect the fact that a different session
will be used.
o When use of a particular connection is to cease, as indicated by o When use of a particular connection is to cease, as indicated by
receiving NFS4ERR_MOVED when using that connection but that receiving NFS4ERR_MOVED when using that connection but that
address is still indicated as accessible according to the address is still indicated as accessible according to the
appropriate file system location entries, it is likely that appropriate file system location entries, it is likely that
requests can be issued on a new connection of a different requests can be issued on a new connection of a different
connection type, once that connection is established. Since any connection type, once that connection is established. Since any
two server endpoints that share a network address are inherently two, non-port-specific server endpoints that share a network
session-trunkable, the client can use BIND_CONN_TO_SESSION to address are inherently session-trunkable, the client can use
access the existing session using the new connection and proceed BIND_CONN_TO_SESSION to access the existing session using the new
to access the file system using the new connection. connection and proceed to access the file system using the new
connection.
o When there are no potential replacement addresses in use but there o When there are no potential replacement addresses in use but there
are valid addresses session-trunkable with the one whose use is to are valid addresses session-trunkable with the one whose use is to
be discontinued, the client can use BIND_CONN_TO_SESSION to access be discontinued, the client can use BIND_CONN_TO_SESSION to access
the existing session using the new address. Although the target the existing session using the new address. Although the target
session will generally be accessible, there may be cases in which session will generally be accessible, there may be rare situations
that session is no longer accessible. In this case, the client in which that session is no longer accessible, when an attempt is
made to bind the new connection to it. In this case, the client
can create a new session to enable continued access to the can create a new session to enable continued access to the
existing instance and provide for use of existing filehandles, existing instance using the new connection, providing for use of
stateids, and client ids while providing continuity of locking existing filehandles, stateids, and client ids while providing
state. continuity of locking state.
o When there is no potential replacement address in use and there o When there is no potential replacement address in use and there
are no valid addresses session-trunkable with the one whose use is are no valid addresses session-trunkable with the one whose use is
to be discontinued, other server-trunkable addresses may be used to be discontinued, other server-trunkable addresses may be used
to provide continued access. Although use of CREATE_SESSION is to provide continued access. Although use of CREATE_SESSION is
available to provide continued access to the existing instance, available to provide continued access to the existing instance,
servers have the option of providing continued access to the servers have the option of providing continued access to the
existing session through the new network access path in a fashion existing session through the new network access path in a fashion
similar to that provided by session migration (see Section 11.11). similar to that provided by session migration (see Section 11.12).
To take advantage of this possibility, clients can perform an To take advantage of this possibility, clients can perform an
initial BIND_CONN_TO_SESSION, as in the previous case, and use initial BIND_CONN_TO_SESSION, as in the previous case, and use
CREATE_SESSION only if that fails. CREATE_SESSION only if that fails.
11.10. Effecting File System Transitions 11.11. Effecting File System Transitions
There are a range of situations in which there is a change to be There are a range of situations in which there is a change to be
effected in the set of replicas used to access a particular file effected in the set of replicas used to access a particular file
system. Some of these may involve an expansion or contraction of the system. Some of these may involve an expansion or contraction of the
set of replicas used as discussed in Section 11.10.1 below. set of replicas used as discussed in Section 11.11.1 below.
For reasons explained in that section, most transitions will involve For reasons explained in that section, most transitions will involve
a transition from a single replica to a corresponding replacement a transition from a single replica to a corresponding replacement
replica. When effecting replica transition, some types of sharing replica. When effecting replica transition, some types of sharing
between the replicas may affect handling of the transition as between the replicas may affect handling of the transition as
described in Sections 11.10.2 through 11.10.8 below. The attribute described in Sections 11.11.2 through 11.11.8 below. The attribute
fs_locations_info provides helpful information to allow the client to fs_locations_info provides helpful information to allow the client to
determine the degree of inter-replica sharing. determine the degree of inter-replica sharing.
With regard to some types of state, the degree of continuity across With regard to some types of state, the degree of continuity across
the transition depends on the occasion prompting the transition, with the transition depends on the occasion prompting the transition, with
transitions initiated by the servers (i.e. migration) offering much transitions initiated by the servers (i.e. migration) offering much
more scope for a non-disruptive transition than cases in which the more scope for a non-disruptive transition than cases in which the
client on its own shifts its access to another replica (i.e. client on its own shifts its access to another replica (i.e.
replication). This issue potentially applies to locking state and to replication). This issue potentially applies to locking state and to
session state, which are dealt with below as follows: session state, which are dealt with below as follows:
o An introduction to the possible means of providing continuity in o An introduction to the possible means of providing continuity in
these areas appears in Section 11.10.9 below. these areas appears in Section 11.11.9 below.
o Transparent State Migration is introduced in Section 11.11. The o Transparent State Migration is introduced in Section 11.12. The
possible transfer of session state is addressed there as well. possible transfer of session state is addressed there as well.
o The client handling of transitions, including determining how to o The client handling of transitions, including determining how to
deal with the various means that the server might take to supply deal with the various means that the server might take to supply
effective continuity of locking state is discussed in effective continuity of locking state is discussed in
Section 11.12. Section 11.13.
o The servers' (source and destination) responsibilities in o The servers' (source and destination) responsibilities in
effecting Transparent Migration of locking and session state are effecting Transparent Migration of locking and session state are
discussed in Section 11.13. discussed in Section 11.14.
11.10.1. File System Transitions and Simultaneous Access 11.11.1. File System Transitions and Simultaneous Access
The fs_locations_info attribute (described in Section 11.16) may The fs_locations_info attribute (described in Section 11.17) may
indicate that two replicas may be used simultaneously, (see indicate that two replicas may be used simultaneously, although some
Section 11.7.2.1 of RFC5661 [62] for details). Although situations situations in which such simultaneous access is permitted are more
in which multiple replicas may be accessed simultaneously are appropriately described as instances of trunking (see
somewhat similar to those in which a single replica is accessed by Section 11.5.4.1). Although situations in which multiple replicas
multiple network addresses, there are important differences, since may be accessed simultaneously are somewhat similar to those in which
locking state is not shared among multiple replicas. a single replica is accessed by multiple network addresses, there are
important differences, since locking state is not shared among
multiple replicas.
Because of this difference in state handling, many clients will not Because of this difference in state handling, many clients will not
have the ability to take advantage of the fact that such replicas have the ability to take advantage of the fact that such replicas
represent the same data. Such clients will not be prepared to use represent the same data. Such clients will not be prepared to use
multiple replicas simultaneously but will access each file system multiple replicas simultaneously but will access each file system
using only a single replica, although the replica selected might make using only a single replica, although the replica selected might make
multiple server-trunkable addresses available. multiple server-trunkable addresses available.
Clients who are prepared to use multiple replicas simultaneously will Clients who are prepared to use multiple replicas simultaneously will
divide opens among replicas however they choose. Once that choice is divide opens among replicas however they choose. Once that choice is
skipping to change at page 246, line 41 skipping to change at page 251, line 39
For example, if one of the replicas become unavailable, access will For example, if one of the replicas become unavailable, access will
be transferred to a different replica, also capable of simultaneous be transferred to a different replica, also capable of simultaneous
access with the one still in use. access with the one still in use.
When there is no such replica, the transition may be to the replica When there is no such replica, the transition may be to the replica
already in use. At this point, the client has a choice between already in use. At this point, the client has a choice between
merging the locking state for the two replicas under the aegis of the merging the locking state for the two replicas under the aegis of the
sole replica in use or treating these separately, until another sole replica in use or treating these separately, until another
replica capable of simultaneous access presents itself. replica capable of simultaneous access presents itself.
11.10.2. Filehandles and File System Transitions 11.11.2. Filehandles and File System Transitions
There are a number of ways in which filehandles can be handled across There are a number of ways in which filehandles can be handled across
a file system transition. These can be divided into two broad a file system transition. These can be divided into two broad
classes depending upon whether the two file systems across which the classes depending upon whether the two file systems across which the
transition happens share sufficient state to effect some sort of transition happens share sufficient state to effect some sort of
continuity of file system handling. continuity of file system handling.
When there is no such cooperation in filehandle assignment, the two When there is no such cooperation in filehandle assignment, the two
file systems are reported as being in different handle classes. In file systems are reported as being in different handle classes. In
this case, all filehandles are assumed to expire as part of the file this case, all filehandles are assumed to expire as part of the file
skipping to change at page 247, line 15 skipping to change at page 252, line 14
FH4_VOL_MIGRATION bit, which only affects behavior when FH4_VOL_MIGRATION bit, which only affects behavior when
fs_locations_info is not available. fs_locations_info is not available.
When there is cooperation in filehandle assignment, the two file When there is cooperation in filehandle assignment, the two file
systems are reported as being in the same handle classes. In this systems are reported as being in the same handle classes. In this
case, persistent filehandles remain valid after the file system case, persistent filehandles remain valid after the file system
transition, while volatile filehandles (excluding those that are only transition, while volatile filehandles (excluding those that are only
volatile due to the FH4_VOL_MIGRATION bit) are subject to expiration volatile due to the FH4_VOL_MIGRATION bit) are subject to expiration
on the target server. on the target server.
11.10.3. Fileids and File System Transitions 11.11.3. Fileids and File System Transitions
In NFSv4.0, the issue of continuity of fileids in the event of a file In NFSv4.0, the issue of continuity of fileids in the event of a file
system transition was not addressed. The general expectation had system transition was not addressed. The general expectation had
been that in situations in which the two file system instances are been that in situations in which the two file system instances are
created by a single vendor using some sort of file system image copy, created by a single vendor using some sort of file system image copy,
fileids would be consistent across the transition, while in the fileids would be consistent across the transition, while in the
analogous multi-vendor transitions they would not. This poses analogous multi-vendor transitions they would not. This poses
difficulties, especially for the client without special knowledge of difficulties, especially for the client without special knowledge of
the transition mechanisms adopted by the server. Note that although the transition mechanisms adopted by the server. Note that although
fileid is not a REQUIRED attribute, many servers support fileids and fileid is not a REQUIRED attribute, many servers support fileids and
skipping to change at page 248, line 25 skipping to change at page 253, line 25
are no reliable filehandles across a transition event (either because are no reliable filehandles across a transition event (either because
there is no filehandle continuity or because the filehandles are there is no filehandle continuity or because the filehandles are
volatile), the client is in a position where it cannot verify that volatile), the client is in a position where it cannot verify that
files it was accessing before the transition are the same objects. files it was accessing before the transition are the same objects.
It is forced to assume that no object has been renamed, and, unless It is forced to assume that no object has been renamed, and, unless
there are guarantees that provide this (e.g., the file system is there are guarantees that provide this (e.g., the file system is
read-only), problems for applications may occur. Therefore, use of read-only), problems for applications may occur. Therefore, use of
such configurations should be limited to situations where the such configurations should be limited to situations where the
problems that this may cause can be tolerated. problems that this may cause can be tolerated.
11.10.4. Fsids and File System Transitions 11.11.4. Fsids and File System Transitions
Since fsids are generally only unique on a per-server basis, it is Since fsids are generally only unique on a per-server basis, it is
likely that they will change during a file system transition. likely that they will change during a file system transition.
Clients should not make the fsids received from the server visible to Clients should not make the fsids received from the server visible to
applications since they may not be globally unique, and because they applications since they may not be globally unique, and because they
may change during a file system transition event. Applications are may change during a file system transition event. Applications are
best served if they are isolated from such transitions to the extent best served if they are isolated from such transitions to the extent
possible. possible.
Although normally a single source file system will transition to a Although normally a single source file system will transition to a
single target file system, there is a provision for splitting a single target file system, there is a provision for splitting a
single source file system into multiple target file systems, by single source file system into multiple target file systems, by
specifying the FSLI4F_MULTI_FS flag. specifying the FSLI4F_MULTI_FS flag.
11.10.4.1. File System Splitting 11.11.4.1. File System Splitting
When a file system transition is made and the fs_locations_info When a file system transition is made and the fs_locations_info
indicates that the file system in question might be split into indicates that the file system in question might be split into
multiple file systems (via the FSLI4F_MULTI_FS flag), the client multiple file systems (via the FSLI4F_MULTI_FS flag), the client
SHOULD do GETATTRs to determine the fsid attribute on all known SHOULD do GETATTRs to determine the fsid attribute on all known
objects within the file system undergoing transition to determine the objects within the file system undergoing transition to determine the
new file system boundaries. new file system boundaries.
Clients might choose to maintain the fsids passed to existing Clients might choose to maintain the fsids passed to existing
applications by mapping all of the fsids for the descendant file applications by mapping all of the fsids for the descendant file
systems to the common fsid used for the original file system. systems to the common fsid used for the original file system.
Splitting a file system can be done on a transition between file Splitting a file system can be done on a transition between file
systems of the same fileid class, since the fact that fileids are systems of the same fileid class, since the fact that fileids are
unique within the source file system ensure they will be unique in unique within the source file system ensure they will be unique in
each of the target file systems. each of the target file systems.
11.10.5. The Change Attribute and File System Transitions 11.11.5. The Change Attribute and File System Transitions
Since the change attribute is defined as a server-specific one, Since the change attribute is defined as a server-specific one,
change attributes fetched from one server are normally presumed to be change attributes fetched from one server are normally presumed to be
invalid on another server. Such a presumption is troublesome since invalid on another server. Such a presumption is troublesome since
it would invalidate all cached change attributes, requiring it would invalidate all cached change attributes, requiring
refetching. Even more disruptive, the absence of any assured refetching. Even more disruptive, the absence of any assured
continuity for the change attribute means that even if the same value continuity for the change attribute means that even if the same value
is retrieved on refetch, no conclusions can be drawn as to whether is retrieved on refetch, no conclusions can be drawn as to whether
the object in question has changed. The identical change attribute the object in question has changed. The identical change attribute
could be merely an artifact of a modified file with a different could be merely an artifact of a modified file with a different
change attribute construction algorithm, with that new algorithm just change attribute construction algorithm, with that new algorithm just
happening to result in an identical change value. happening to result in an identical change value.
When the two file systems have consistent change attribute formats, When the two file systems have consistent change attribute formats,
and this fact is communicated to the client by reporting in the same and this fact is communicated to the client by reporting in the same
change class, the client may assume a continuity of change attribute change class, the client may assume a continuity of change attribute
construction and handle this situation just as it would be handled construction and handle this situation just as it would be handled
without any file system transition. without any file system transition.
11.10.6. Write Verifiers and File System Transitions 11.11.6. Write Verifiers and File System Transitions
In a file system transition, the two file systems might be clustered In a file system transition, the two file systems might be
in the handling of unstably written data. When this is the case, and cooperating in the handling of unstably written data. Clients can
the two file systems belong to the same write-verifier class, write determine if this is the case, by seeing if the two file systems
verifiers returned from one system may be compared to those returned belong to the same write-verifier class. When this is the case,
by the other and superfluous writes avoided. write verifiers returned from one system may be compared to those
returned by the other and superfluous writes avoided.
When two file systems belong to different write-verifier classes, any When two file systems belong to different write-verifier classes, any
verifier generated by one must not be compared to one provided by the verifier generated by one must not be compared to one provided by the
other. Instead, the two verifiers should be treated as not equal other. Instead, the two verifiers should be treated as not equal
even when the values are identical. even when the values are identical.
11.10.7. Readdir Cookies and Verifiers and File System Transitions 11.11.7. Readdir Cookies and Verifiers and File System Transitions
In a file system transition, the two file systems might be consistent In a file system transition, the two file systems might be consistent
in their handling of READDIR cookies and verifiers. When this is the in their handling of READDIR cookies and verifiers. Clients can
case, and the two file systems belong to the same readdir class, determine if this is the case, by seeing if the two file systems
READDIR cookies and verifiers from one system may be recognized by belong to the same readdir class. When this is the case, readdir
the other and READDIR operations started on one server may be validly class, READDIR cookies and verifiers from one system will be
continued on the other, simply by presenting the cookie and verifier recognized by the other and READDIR operations started on one server
returned by a READDIR operation done on the first file system to the can be validly continued on the other, simply by presenting the
second. cookie and verifier returned by a READDIR operation done on the first
file system to the second.
When two file systems belong to different readdir classes, any When two file systems belong to different readdir classes, any
READDIR cookie and verifier generated by one is not valid on the READDIR cookie and verifier generated by one is not valid on the
second, and must not be presented to that server by the client. The second, and must not be presented to that server by the client. The
client should act as if the verifier was rejected. client should act as if the verifier were rejected.
11.10.8. File System Data and File System Transitions 11.11.8. File System Data and File System Transitions
When multiple replicas exist and are used simultaneously or in When multiple replicas exist and are used simultaneously or in
succession by a client, applications using them will normally expect succession by a client, applications using them will normally expect
that they contain either the same data or data that is consistent that they contain either the same data or data that is consistent
with the normal sorts of changes that are made by other clients with the normal sorts of changes that are made by other clients
updating the data of the file system (with metadata being the same to updating the data of the file system (with metadata being the same to
the degree indicated by the fs_locations_info attribute). However, the degree indicated by the fs_locations_info attribute). However,
when multiple file systems are presented as replicas of one another, when multiple file systems are presented as replicas of one another,
the precise relationship between the data of one and the data of the precise relationship between the data of one and the data of
another is not, as a general matter, specified by the NFSv4.1 another is not, as a general matter, specified by the NFSv4.1
protocol. It is quite possible to present as replicas file systems protocol. It is quite possible to present as replicas file systems
where the data of those file systems is sufficiently different that where the data of those file systems is sufficiently different that
some applications have problems dealing with the transition between some applications have problems dealing with the transition between
replicas. The namespace will typically be constructed so that replicas. The namespace will typically be constructed so that
applications can choose an appropriate level of support, so that in applications can choose an appropriate level of support, so that in
one position in the namespace a varied set of replicas will be one position in the namespace a varied set of replicas might be
listed, while in another only those that are up-to-date may be listed, while in another only those that are up-to-date would be
considered replicas. The protocol does define three special cases of considered replicas. The protocol does define three special cases of
the relationship among replicas to be specified by the server and the relationship among replicas to be specified by the server and
relied upon by clients: relied upon by clients:
o When multiple replicas exist and are used simultaneously by a o When multiple replicas exist and are used simultaneously by a
client (see the FSLIB4_CLSIMUL definition within client (see the FSLIB4_CLSIMUL definition within
fs_locations_info), they must designate the same data. Where file fs_locations_info), they must designate the same data. Where file
systems are writable, a change made on one instance must be systems are writable, a change made on one instance must be
visible on all instances, immediately upon the earlier of the visible on all instances at the same time, regardless of whether
return of the modifying requester or the visibility of that change the interrogated instance is the one on which the modification was
on any of the associated replicas. This allows a client to use done. This allows a client to use these replicas simultaneously
these replicas simultaneously without any special adaptation to without any special adaptation to the fact that there are multiple
the fact that there are multiple replicas, beyond adapting to the replicas, beyond adapting to the fact that locks obtained on one
fact that locks obtained on one replica are maintained separately replica are maintained separately (i.e. under a different client
(i.e. under a different client ID). In this case, locks (whether ID). In this case, locks (whether share reservations or byte-
share reservations or byte-range locks) and delegations obtained range locks) and delegations obtained on one replica are
on one replica are immediately reflected on all replicas, in the immediately reflected on all replicas, in the sense that access
sense that access from all other servers is prevented regardless from all other servers is prevented regardless of the replica
of the replica used. However, because the servers are not used. However, because the servers are not required to treat two
required to treat two associated client IDs as representing the associated client IDs as representing the same client, it is best
same client, it is best to access each file using only a single to access each file using only a single client ID.
client ID.
o When one replica is designated as the successor instance to o When one replica is designated as the successor instance to
another existing instance after return NFS4ERR_MOVED (i.e., the another existing instance after return NFS4ERR_MOVED (i.e., the
case of migration), the client may depend on the fact that all case of migration), the client may depend on the fact that all
changes written to stable storage on the original instance are changes written to stable storage on the original instance are
written to stable storage of the successor (uncommitted writes are written to stable storage of the successor (uncommitted writes are
dealt with in Section 11.10.6 above). dealt with in Section 11.11.6 above).
o Where a file system is not writable but represents a read-only o Where a file system is not writable but represents a read-only
copy (possibly periodically updated) of a writable file system, copy (possibly periodically updated) of a writable file system,
clients have similar requirements with regard to the propagation clients have similar requirements with regard to the propagation
of updates. They may need a guarantee that any change visible on of updates. They may need a guarantee that any change visible on
the original file system instance must be immediately visible on the original file system instance must be immediately visible on
any replica before the client transitions access to that replica, any replica before the client transitions access to that replica,
in order to avoid any possibility that a client, in effecting a in order to avoid any possibility that a client, in effecting a
transition to a replica, will see any reversion in file system transition to a replica, will see any reversion in file system
state. The specific means of this guarantee varies based on the state. The specific means of this guarantee varies based on the
value of the fss_type field that is reported as part of the value of the fss_type field that is reported as part of the
fs_status attribute (see Section 11.17). Since these file systems fs_status attribute (see Section 11.18). Since these file systems
are presumed to be unsuitable for simultaneous use, there is no are presumed to be unsuitable for simultaneous use, there is no
specification of how locking is handled; in general, locks specification of how locking is handled; in general, locks
obtained on one file system will be separate from those on others. obtained on one file system will be separate from those on others.
Since these are expected to be read-only file systems, this is not Since these are expected to be read-only file systems, this is not
likely to pose an issue for clients or applications. likely to pose an issue for clients or applications.
11.10.9. Lock State and File System Transitions When none of these special situations apply, there is no basis,
within the protocol for the client to make assumptions about the
contents of a replica file system or its relationship to previous
file system instances. Thus switching between nominally identical
read-write file systems would not be possible, because either the
client does not use or the server does not support the
fs_locations_info attribute.
11.11.9. Lock State and File System Transitions
While accessing a file system, clients obtain locks enforced by the While accessing a file system, clients obtain locks enforced by the
server which may prevent actions by other clients that are server which may prevent actions by other clients that are
inconsistent with those locks. inconsistent with those locks.
When access is transferred between replicas, clients need to be When access is transferred between replicas, clients need to be
assured that the actions disallowed by holding these locks cannot assured that the actions disallowed by holding these locks cannot
have occurred during the transition. This can be ensured by the have occurred during the transition. This can be ensured by the
methods below. Unless at least one of these is implemented, clients methods below. Unless at least one of these is implemented, clients
will not be assured of continuity of lock possession across a will not be assured of continuity of lock possession across a
migration event. migration event.
o Providing the client an opportunity to re-obtain his locks via a o Providing the client an opportunity to re-obtain his locks via a
per-fs grace period on the destination server. Because the lock per-fs grace period on the destination server, denying all clients
reclaim mechanism was originally defined to support server reboot, using the destination file system the opportunity to obtain new
it implicitly assumes that file handles will on reclaim will be locks that conflict which those held by the transferred client as
the same as those at open. In the case of migration, this long as that client has not completed its per-fs grace period.
requires that source and destination servers use the same Because the lock reclaim mechanism was originally defined to
filehandles, as evidenced by using the same server scope (see support server reboot, it implicitly assumes that file handles
Section 2.10.4) or by showing this agreement using will, upon reclaim, will be the same as those at open. In the
fs_locations_info (see Section 11.10.2 above). case of migration, this requires that source and destination
servers use the same filehandles, as evidenced by using the same
server scope (see Section 2.10.4) or by showing this agreement
using fs_locations_info (see Section 11.11.2 above).
Note that such a grace period can be implemented without
interfering with the ability of non-transferred clients to obtain
new locks while it is going on. As long as the destination server
is aware of the transferred locks, it can distinguish requests to
obtain new locks that contrast with existing locks from those that
do not, allowing it to treat such client requests without
reference to the ongoing grace period.
o Locking state can be transferred as part of the transition by o Locking state can be transferred as part of the transition by
providing Transparent State Migration as described in providing Transparent State Migration as described in
Section 11.11. Section 11.12.
Of these, Transparent State Migration provides the smoother Of these, Transparent State Migration provides the smoother
experience for clients in that there is no grace-period-based delay experience for clients in that there is no need to go through a
before new locks can be obtained. However, it requires a greater reclaim process before new locks can be obtained. However, it
degree of inter-server co-ordination. In general, the servers taking requires a greater degree of inter-server co-ordination. In general,
part in migration are free to provide either facility. However, when the servers taking part in migration are free to provide either
the filehandles can differ across the migration event, Transparent facility. However, when the filehandles can differ across the
State Migration is the only available means of providing the needed migration event, Transparent State Migration is the only available
functionality. means of providing the needed functionality.
It should be noted that these two methods are not mutually exclusive It should be noted that these two methods are not mutually exclusive
and that a server might well provide both. In particular, if there and that a server might well provide both. In particular, if there
is some circumstance preventing a specific lock from being is some circumstance preventing a specific lock from being
transferred transparently, the destination server can allow it to be transferred transparently, the destination server can allow it to be
reclaimed, by implementing a per-fs grace period for the migrated reclaimed, by implementing a per-fs grace period for the migrated
file system. file system.
11.10.9.1. Leases and File System Transitions 11.11.9.1. Security Consideration Related to Reclaiming Lock State
after File System Transitions
Although it is possible for a client reclaiming state to misrepresent
its state, in the same fashion as described in Section 8.4.2.1.1,
most implementations providing for such reclamation in the case of
file system transitions will have the ability to detect such
misrepresentations. This limits the ability of unauthenticated
clients to execute denial-of-service attacks in these circumstances.
Nevertheless, the rules stated in Section 8.4.2.1.1, regarding
principal verification for reclaim requests, apply in this situation
as well.
Typically, implementations that support file system transitions will
have extensive information about the locks to be transferred. This
is because:
o Since failure is not involved, there is no need store to locking
information in persistent storage.
o There is no need, as there is in the failure case, to update
multiple repositories containing locking state to keep them in
sync. Instead, there is a one-time communication of locking state
from the source to the destination server.
o Providing this information avoids potential interference with
existing clients using the destination file system, by denying
them the ability to obtain new locks during the grace period.
When such detailed locking information, not necessarily including the
associated stateids, is available:
o It is possible to detect reclaim requests that attempt to reclaim
locks that did not exist before the transfer, rejecting them with
NFS4ERR_RECLAIM_BAD (Section 15.1.9.4).
o It is possible when dealing with non-reclaim requests, to
determine whether they conflict with existing locks, eliminating
the need to return NFS4ERR_GRACE ((Section 15.1.9.2) on non-
reclaim requests.
It is possible for implementations of grace periods in connection
with file system transitions not to have detailed locking information
available at the destination server, in which case the security
situation is exactly as described in Section 8.4.2.1.1.
11.11.9.2. Leases and File System Transitions
In the case of lease renewal, the client may not be submitting In the case of lease renewal, the client may not be submitting
requests for a file system that has been transferred to another requests for a file system that has been transferred to another
server. This can occur because of the lease renewal mechanism. The server. This can occur because of the lease renewal mechanism. The
client renews the lease associated with all file systems when client renews the lease associated with all file systems when
submitting a request on an associated session, regardless of the submitting a request on an associated session, regardless of the
specific file system being referenced. specific file system being referenced.
In order for the client to schedule renewal of its lease where there In order for the client to schedule renewal of its lease where there
is locking state that may have been relocated to the new server, the is locking state that may have been relocated to the new server, the
skipping to change at page 253, line 23 skipping to change at page 259, line 42
determined, the client can follow the normal process to obtain the determined, the client can follow the normal process to obtain the
new server information (through the fs_locations and new server information (through the fs_locations and
fs_locations_info attributes) and perform renewal of that lease on fs_locations_info attributes) and perform renewal of that lease on
the new server, unless information in the fs_locations_info attribute the new server, unless information in the fs_locations_info attribute
shows that no state could have been transferred. If the server has shows that no state could have been transferred. If the server has
not had state transferred to it transparently, the client will not had state transferred to it transparently, the client will
receive NFS4ERR_STALE_CLIENTID from the new server, as described receive NFS4ERR_STALE_CLIENTID from the new server, as described
above, and the client can then reclaim locks as is done in the event above, and the client can then reclaim locks as is done in the event
of server failure. of server failure.
11.10.9.2. Transitions and the Lease_time Attribute 11.11.9.3. Transitions and the Lease_time Attribute
In order that the client may appropriately manage its lease in the In order that the client may appropriately manage its lease in the
case of a file system transition, the destination server must case of a file system transition, the destination server must
establish proper values for the lease_time attribute. establish proper values for the lease_time attribute.
When state is transferred transparently, that state should include When state is transferred transparently, that state should include
the correct value of the lease_time attribute. The lease_time the correct value of the lease_time attribute. The lease_time
attribute on the destination server must never be less than that on attribute on the destination server must never be less than that on
the source, since this would result in premature expiration of a the source, since this would result in premature expiration of a
lease granted by the source server. Upon transitions in which state lease granted by the source server. Upon transitions in which state
skipping to change at page 253, line 48 skipping to change at page 260, line 18
If state has not been transferred transparently, either because the If state has not been transferred transparently, either because the
associated servers are shown as having different eir_server_scope associated servers are shown as having different eir_server_scope
strings or because the client ID is rejected when presented to the strings or because the client ID is rejected when presented to the
new server, the client should fetch the value of lease_time on the new server, the client should fetch the value of lease_time on the
new (i.e., destination) server, and use it for subsequent locking new (i.e., destination) server, and use it for subsequent locking
requests. However, the server must respect a grace period of at requests. However, the server must respect a grace period of at
least as long as the lease_time on the source server, in order to least as long as the lease_time on the source server, in order to
ensure that clients have ample time to reclaim their lock before ensure that clients have ample time to reclaim their lock before
potentially conflicting non-reclaimed locks are granted. potentially conflicting non-reclaimed locks are granted.
11.11. Transferring State upon Migration 11.12. Transferring State upon Migration
When the transition is a result of a server-initiated decision to When the transition is a result of a server-initiated decision to
transition access and the source and destination servers have transition access and the source and destination servers have
implemented appropriate co-operation, it is possible to: implemented appropriate co-operation, it is possible to:
o Transfer locking state from the source to the destination server, o Transfer locking state from the source to the destination server,
in a fashion similar to that provided by Transparent State in a fashion similar to that provided by Transparent State
Migration in NFSv4.0, as described in [65]. Server Migration in NFSv4.0, as described in [68]. Server
responsibilities are described in Section 11.13.2. responsibilities are described in Section 11.14.2.
o Transfer session state from the source to the destination server. o Transfer session state from the source to the destination server.
Server responsibilities in effecting such a transfer are described Server responsibilities in effecting such a transfer are described
in Section 11.13.3. in Section 11.14.3.
The means by which the client determines which of these transfer The means by which the client determines which of these transfer
events has occurred are described in Section 11.12. events has occurred are described in Section 11.13.
11.11.1. Transparent State Migration and pNFS 11.12.1. Transparent State Migration and pNFS
When pNFS is involved, the protocol is capable of supporting: When pNFS is involved, the protocol is capable of supporting:
o Migration of the Metadata Server (MDS), leaving the Data Servers o Migration of the Metadata Server (MDS), leaving the Data Servers
(DS's) in place. (DS's) in place.
o Migration of the file system as a whole, including the MDS and o Migration of the file system as a whole, including the MDS and
associated DS's. associated DS's.
o Replacement of one DS by another. o Replacement of one DS by another.
skipping to change at page 255, line 24 skipping to change at page 261, line 44
such but can be effected by an MDS recalling layouts for the DS to be such but can be effected by an MDS recalling layouts for the DS to be
replaced and issuing new ones to be served by the successor DS. replaced and issuing new ones to be served by the successor DS.
Migration may transfer a file system from a server which does not Migration may transfer a file system from a server which does not
support pNFS to one which does. In order to properly adapt to this support pNFS to one which does. In order to properly adapt to this
situation, clients which support pNFS, but function adequately in its situation, clients which support pNFS, but function adequately in its
absence should check for pNFS support when a file system is migrated absence should check for pNFS support when a file system is migrated
and be prepared to use pNFS when support is available on the and be prepared to use pNFS when support is available on the
destination. destination.
11.12. Client Responsibilities when Access is Transitioned 11.13. Client Responsibilities when Access is Transitioned
For a client to respond to an access transition, it must become aware For a client to respond to an access transition, it must become aware
of it. The ways in which this can happen are discussed in of it. The ways in which this can happen are discussed in
Section 11.12.1 which discusses indications that a specific file Section 11.13.1 which discusses indications that a specific file
system access path has transitioned as well as situations in which system access path has transitioned as well as situations in which
additional activity is necessary to determine the set of file systems additional activity is necessary to determine the set of file systems
that have been migrated. Section 11.12.2 goes on to complete the that have been migrated. Section 11.13.2 goes on to complete the
discussion of how the set of migrated file systems might be discussion of how the set of migrated file systems might be
determined. Sections 11.12.3 through 11.12.5 discuss how the client determined. Sections 11.13.3 through 11.13.5 discuss how the client
should deal with each transition it becomes aware of, either directly should deal with each transition it becomes aware of, either directly
or as a result of migration discovery. or as a result of migration discovery.
The following terms are used to describe client activities: The following terms are used to describe client activities:
o "Transition recovery" refers to the process of restoring access to o "Transition recovery" refers to the process of restoring access to
a file system on which NFS4ERR_MOVED was received. a file system on which NFS4ERR_MOVED was received.
o "Migration recovery" to that subset of transition recovery which o "Migration recovery" to that subset of transition recovery which
applies when the file system has migrated to a different replica. applies when the file system has migrated to a different replica.
o "Migration discovery" refers to the process of determining which o "Migration discovery" refers to the process of determining which
file system(s) have been migrated. It is necessary to avoid a file system(s) have been migrated. It is necessary to avoid a
situation in which leases could expire when a file system is not situation in which leases could expire when a file system is not
accessed for a long period of time, since a client unaware of the accessed for a long period of time, since a client unaware of the
migration might be referencing an unmigrated file system and not migration might be referencing an unmigrated file system and not
renewing the lease associated with the migrated file system. renewing the lease associated with the migrated file system.
11.12.1. Client Transition Notifications 11.13.1. Client Transition Notifications
When there is a change in the network access path which a client is When there is a change in the network access path which a client is
to use to access a file system, there are a number of related status to use to access a file system, there are a number of related status
indications with which clients need to deal: indications with which clients need to deal:
o If an attempt is made to use or return a filehandle within a file o If an attempt is made to use or return a filehandle within a file
system that is no longer accessible at the address previously used system that is no longer accessible at the address previously used
to access it, the error NFS4ERR_MOVED is returned. to access it, the error NFS4ERR_MOVED is returned.
Exceptions are made to allow such file handles to be used when Exceptions are made to allow such file handles to be used when
skipping to change at page 258, line 10 skipping to change at page 264, line 31
on, the client merely waits for that recovery to be completed while on, the client merely waits for that recovery to be completed while
the receipt of SEQ4_STATUS_LEASE_MOVED indication only needs to the receipt of SEQ4_STATUS_LEASE_MOVED indication only needs to
initiate migration discovery for a server if such discovery is not initiate migration discovery for a server if such discovery is not
already underway for that server. already underway for that server.
The fact that a lease-migrated condition does not result in an error The fact that a lease-migrated condition does not result in an error
in NFSv4.1 has a number of important consequences. In addition to in NFSv4.1 has a number of important consequences. In addition to
the fact, discussed above, that the two indications are not mutually the fact, discussed above, that the two indications are not mutually
exclusive, there are number of issues that are important in exclusive, there are number of issues that are important in
considering implementation of migration discovery, as discussed in considering implementation of migration discovery, as discussed in
Section 11.12.2. Section 11.13.2.
Because of the absence of NFSV4ERR_LEASE_MOVED, it is possible for Because SEQ4_STATUS_LEASE_MOVED is not an error condition", it is
file systems whose access path has not changed to be successfully possible for file systems whose access paths have not changed to be
accessed on a given server even though recovery is necessary for successfully accessed on a given server even though recovery is
other file systems on the same server. As a result, access can go on necessary for other file systems on the same server. As a result,
while, access can go on while,
o The migration discovery process is going on for that server. o The migration discovery process is going on for that server.
o The transition recovery process is going on for on other file o The transition recovery process is going on for other file systems
systems connected to that server. connected to that server.
11.12.2. Performing Migration Discovery 11.13.2. Performing Migration Discovery
Migration discovery can be performed in the same context as Migration discovery can be performed in the same context as
transition recovery, allowing recovery for each migrated file system transition recovery, allowing recovery for each migrated file system
to be invoked as it is discovered. Alternatively, it may be done in to be invoked as it is discovered. Alternatively, it may be done in
a separate migration discovery thread, allowing migration discovery a separate migration discovery thread, allowing migration discovery
to be done in parallel with one or more instances of transition to be done in parallel with one or more instances of transition
recovery. recovery.
In either case, because the lease-migrated indication does not result In either case, because the lease-migrated indication does not result
in an error. other access to file systems on the server can proceed in an error. other access to file systems on the server can proceed
normally, with the possibility that further such indications will be normally, with the possibility that further such indications will be
received, raising the issue of how such indications are to be dealt received, raising the issue of how such indications are to be dealt
with. In general, with. In general,
o No action needs to be taken for such indications received by the o No action needs to be taken for such indications received by any
those performing migration discovery, since continuation of that threads performing migration discovery, since continuation of that
work will address the issue. work will address the issue.
o In other cases in which migration discovery is currently being o In other cases in which migration discovery is currently being
performed, nothing further needs to be done to respond to such performed, nothing further needs to be done to respond to such
lease migration indications, as long as one can be certain that lease migration indications, as long as one can be certain that
the migration discovery process would deal with those indications. the migration discovery process would deal with those indications.
See below for details. See below for details.
o For such indications received in all other contexts, the o For such indications received in all other contexts, the
appropriate response is to initiate or otherwise provide for the appropriate response is to initiate or otherwise provide for the
skipping to change at page 259, line 43 skipping to change at page 266, line 15
Given that framework, migration discovery processing would proceed as Given that framework, migration discovery processing would proceed as
follows. follows.
o While in the normal-operation state, the thread performing o While in the normal-operation state, the thread performing
discovery would fetch, for successive file systems known to the discovery would fetch, for successive file systems known to the
client on the server being worked on, a file system location client on the server being worked on, a file system location
attribute plus the fs_status attribute. attribute plus the fs_status attribute.
o If the fs_status attribute indicates that the file system is a o If the fs_status attribute indicates that the file system is a
migrated one (i.e. fss_absent is true and fss_type != migrated one (i.e. fss_absent is true and fss_type !=
STATUS4_REFERRAL) and thus that it is likely that the fetch of the STATUS4_REFERRAL) then a migrated file system has been found. In
file system location attribute has cleared one the file systems this situation, it is likely that the fetch of the file system
contributing to the lease-migrated indication. location attribute has cleared one the file systems contributing
to the lease-migrated indication.
o In cases in which that happened, the thread cannot know whether o In cases in which that happened, the thread cannot know whether
the lease-migrated indication has been cleared and so it enters the lease-migrated indication has been cleared and so it enters
the completion/verification state and proceeds to issue a COMPOUND the completion/verification state and proceeds to issue a COMPOUND
to see if the LEASE_MOVED indication has been cleared. to see if the LEASE_MOVED indication has been cleared.
o When the discovery process is in the completion/verification o When the discovery process is in the completion/verification
state, if other requests get a lease-migrated indication they note state, if other requests get a lease-migrated indication they note
that it was received. Laater, the existence of such indications that it was received. Later, the existence of such indications is
is used when the request completes, as described below. used when the request completes, as described below.
When the request used in the completion/verification state completes: When the request used in the completion/verification state completes:
o If a lease-migrated indication is returned, the discovery o If a lease-migrated indication is returned, the discovery
continues normally. Note that this is so even if all file systems continues normally. Note that this is so even if all file systems
have traversed, since new migrations could have occurred while the have traversed, since new migrations could have occurred while the
process was going on. process was going on.
o Otherwise, if there is any record that other requests saw a lease- o Otherwise, if there is any record that other requests saw a lease-
migrated indication while the request was going on, that record is migrated indication while the request was going on, that record is
skipping to change at page 260, line 36 skipping to change at page 267, line 7
process. process.
It should be noted that the process described above is not guaranteed It should be noted that the process described above is not guaranteed
to terminate, as a long series of new migration events might to terminate, as a long series of new migration events might
continually delay the clearing of the LEASE_MOVED indication. To continually delay the clearing of the LEASE_MOVED indication. To
prevent unnecessary lease expiration, it is appropriate for clients prevent unnecessary lease expiration, it is appropriate for clients
to use the discovery of migrations to effect lease renewal to use the discovery of migrations to effect lease renewal
immediately, rather than waiting for clearing of the LEASE_MOVED immediately, rather than waiting for clearing of the LEASE_MOVED
indication when the complete set of migrations is available. indication when the complete set of migrations is available.
11.12.3. Overview of Client Response to NFS4ERR_MOVED Lease discovery needs to be provided as described above. This
ensures that the client discovers file system migrations soon enough
to renew its leases on each destination server before they expire.
Non-renewal of leases can lead to loss of locking state. While the
consequences of such loss can be ameliorated through implementations
of courtesy locks, servers are under no obligation to do so, and a
conflicting lock request may mean that a lock is revoked
unexpectedly. Clients should be aware of this possibility.
11.13.3. Overview of Client Response to NFS4ERR_MOVED
This section outlines a way in which a client that receives This section outlines a way in which a client that receives
NFS4ERR_MOVED can effect transition recovery by using a new server or NFS4ERR_MOVED can effect transition recovery by using a new server or
server endpoint if one is available. As part of that process, it server endpoint if one is available. As part of that process, it
will determine: will determine:
o Whether the NFS4ERR_MOVED indicates migration has occurred, or o Whether the NFS4ERR_MOVED indicates migration has occurred, or
whether it indicates another sort of file system access transition whether it indicates another sort of file system access transition
as discussed in Section 11.9 above. as discussed in Section 11.10 above.
o In the case of migration, whether Transparent State Migration has o In the case of migration, whether Transparent State Migration has
occurred. occurred.
o Whether any state has been lost during the process of Transparent o Whether any state has been lost during the process of Transparent
State Migration. State Migration.
o Whether sessions have been transferred as part of Transparent o Whether sessions have been transferred as part of Transparent
State Migration. State Migration.
skipping to change at page 261, line 37 skipping to change at page 268, line 18
server-trunkable with that used to access the file system when server-trunkable with that used to access the file system when
access was terminated by receiving NFS4ERR_MOVED. If it is, then access was terminated by receiving NFS4ERR_MOVED. If it is, then
migration has not occurred. In that case, the transition is migration has not occurred. In that case, the transition is
dealt with, at least initially, as one involving continued access dealt with, at least initially, as one involving continued access
to the same file system on the same server through a new network to the same file system on the same server through a new network
address. address.
3. Obtaining access to existing session state or creating new 3. Obtaining access to existing session state or creating new
sessions. How this is done depends on the initial determination sessions. How this is done depends on the initial determination
of whether migration has occurred and can be done as described in of whether migration has occurred and can be done as described in
Section 11.12.4 below in the case of migration or as described in Section 11.13.4 below in the case of migration or as described in
Section 11.12.5 below in the case of a network address transfer Section 11.13.5 below in the case of a network address transfer
without migration. without migration.
4. Verification of the trunking relationship assumed in step 2 as 4. Verification of the trunking relationship assumed in step 2 as
discussed in Section 2.10.5.1. Although this step will generally discussed in Section 2.10.5.1. Although this step will generally
confirm the initial determination, it is possible for confirm the initial determination, it is possible for
verification to fail with the result that an initial verification to fail with the result that an initial
determination that a network address shift (without migration) determination that a network address shift (without migration)
has occurred may be invalidated and migration determined to have has occurred may be invalidated and migration determined to have
occurred. There is no need to redo step 3 above, since it will occurred. There is no need to redo step 3 above, since it will
be possible to continue use of the session established already. be possible to continue use of the session established already.
5. Obtaining access to existing locking state and/or reobtaining it. 5. Obtaining access to existing locking state and/or reobtaining it.
How this is done depends on the final determination of whether How this is done depends on the final determination of whether
migration has occurred and can be done as described below in migration has occurred and can be done as described below in
Section 11.12.4 in the case of migration or as described in Section 11.13.4 in the case of migration or as described in
Section 11.12.5 in the case of a network address transfer without Section 11.13.5 in the case of a network address transfer without
migration. migration.
Once the initial address has been determined, clients are free to Once the initial address has been determined, clients are free to
apply an abbreviated process to find additional addresses trunkable apply an abbreviated process to find additional addresses trunkable
with it (clients may seek session-trunkable or server-trunkable with it (clients may seek session-trunkable or server-trunkable
addresses depending on whether they support clientid trunking). addresses depending on whether they support clientid trunking).
During this later phase of the process, further location entries are During this later phase of the process, further location entries are
examined using the abbreviated procedure specified below: examined using the abbreviated procedure specified below:
A: Before the EXCHANGE_ID, the fs name of the location entry is A: Before the EXCHANGE_ID, the fs name of the location entry is
skipping to change at page 262, line 29 skipping to change at page 269, line 11
B: In the case that the network address is session-trunkable with B: In the case that the network address is session-trunkable with
one used previously a BIND_CONN_TO_SESSION is used to access that one used previously a BIND_CONN_TO_SESSION is used to access that
session using the new network address. Otherwise, or if the bind session using the new network address. Otherwise, or if the bind
operation fails, a CREATE_SESSION is done. operation fails, a CREATE_SESSION is done.
C: The verification procedure referred to in step 4 above is used. C: The verification procedure referred to in step 4 above is used.
However, if it fails, the entry is ignored and the next available However, if it fails, the entry is ignored and the next available
entry is used. entry is used.
11.12.4. Obtaining Access to Sessions and State after Migration 11.13.4. Obtaining Access to Sessions and State after Migration
In the event that migration has occurred, migration recovery will In the event that migration has occurred, migration recovery will
involve determining whether Transparent State Migration has occurred. involve determining whether Transparent State Migration has occurred.
This decision is made based on the client ID returned by the This decision is made based on the client ID returned by the
EXCHANGE_ID and the reported confirmation status. EXCHANGE_ID and the reported confirmation status.
o If the client ID is an unconfirmed client ID not previously known o If the client ID is an unconfirmed client ID not previously known
to the client, then Transparent State Migration has not occurred. to the client, then Transparent State Migration has not occurred.
o If the client ID is a confirmed client ID previously known to the o If the client ID is a confirmed client ID previously known to the
skipping to change at page 263, line 33 skipping to change at page 270, line 15
when the slot sequence values used are not appropriate on the new when the slot sequence values used are not appropriate on the new
session. When this occurs, the client can create a new a session and session. When this occurs, the client can create a new a session and
cease using the existing one. cease using the existing one.
Once the client has determined the initial migration status, and Once the client has determined the initial migration status, and
determined that there was a shift to a new server, it needs to re- determined that there was a shift to a new server, it needs to re-
establish its locking state, if possible. To enable this to happen establish its locking state, if possible. To enable this to happen
without loss of the guarantees normally provided by locking, the without loss of the guarantees normally provided by locking, the
destination server needs to implement a per-fs grace period in all destination server needs to implement a per-fs grace period in all
cases in which lock state was lost, including those in which cases in which lock state was lost, including those in which
Transparent State Migration was not implemented. Transparent State Migration was not implemented. Each client for
which there was a transfer of locking state to the new server will
have the duration of the grace period to reclaim its locks, from the
time its locks were transferred.
Clients need to deal with the following cases: Clients need to deal with the following cases:
o In the state merger case, it is possible that the server has not o In the state merger case, it is possible that the server has not
attempted Transparent State Migration, in which case state may attempted Transparent State Migration, in which case state may
have been lost without it being reflected in the SEQ4_STATUS bits. have been lost without it being reflected in the SEQ4_STATUS bits.
To determine whether this has happened, the client can use To determine whether this has happened, the client can use
TEST_STATEID to check whether the stateids created on the source TEST_STATEID to check whether the stateids created on the source
server are still accessible on the destination server. Once a server are still accessible on the destination server. Once a
single stateid is found to have been successfully transferred, the single stateid is found to have been successfully transferred, the
skipping to change at page 264, line 19 skipping to change at page 271, line 5
o In a case in which Transparent State Migration has occurred, and o In a case in which Transparent State Migration has occurred, and
some lock state was lost (as shown by SEQ4_STATUS flags), existing some lock state was lost (as shown by SEQ4_STATUS flags), existing
stateids need to be checked for validity using TEST_STATEID, and stateids need to be checked for validity using TEST_STATEID, and
reclaim used to re-establish any that were not transferred. reclaim used to re-establish any that were not transferred.
For all of the cases above, RECLAIM_COMPLETE with an rca_one_fs value For all of the cases above, RECLAIM_COMPLETE with an rca_one_fs value
of TRUE needs to be done before normal use of the file system of TRUE needs to be done before normal use of the file system
including obtaining new locks for the file system. This applies even including obtaining new locks for the file system. This applies even
if no locks were lost and there was no need for any to be reclaimed. if no locks were lost and there was no need for any to be reclaimed.
11.12.5. Obtaining Access to Sessions and State after Network Address 11.13.5. Obtaining Access to Sessions and State after Network Address
Transfer Transfer
The case in which there is a transfer to a new network address The case in which there is a transfer to a new network address
without migration is similar to that described in Section 11.12.4 without migration is similar to that described in Section 11.13.4
above in that there is a need to obtain access to needed sessions and above in that there is a need to obtain access to needed sessions and
locking state. However, the details are simpler and will vary locking state. However, the details are simpler and will vary
depending on the type of trunking between the address receiving depending on the type of trunking between the address receiving
NFS4ERR_MOVED and that to which the transfer is to be made NFS4ERR_MOVED and that to which the transfer is to be made
To make a session available for use, a BIND_CONN_TO_SESSION should be To make a session available for use, a BIND_CONN_TO_SESSION should be
used to obtain access to the session previously in use. Only if this used to obtain access to the session previously in use. Only if this
fails, should a CREATE_SESSION be done. While this procedure mirrors fails, should a CREATE_SESSION be done. While this procedure mirrors
that in Section 11.12.4 above, there is an important difference in that in Section 11.13.4 above, there is an important difference in
that preservation of the session is not purely optional but depends that preservation of the session is not purely optional but depends
on the type of trunking. on the type of trunking.
Access to appropriate locking state will generally need no actions Access to appropriate locking state will generally need no actions
beyond access to the session. However, the SEQ4_STATUS bits need to beyond access to the session. However, the SEQ4_STATUS bits need to
be checked for lost locking state, including the need to reclaim be checked for lost locking state, including the need to reclaim
locks after a server reboot, since there is always a possibility of locks after a server reboot, since there is always a possibility of
locking state being lost. locking state being lost.
11.13. Server Responsibilities Upon Migration 11.14. Server Responsibilities Upon Migration
In the event of file system migration, when the client connects to In the event of file system migration, when the client connects to
the destination server, that server needs to be able to provide the the destination server, that server needs to be able to provide the
client continued to access the files it had open on the source client continued to access the files it had open on the source
server. There are two ways to provide this: server. There are two ways to provide this:
o By provision of an fs-specific grace period, allowing the client o By provision of an fs-specific grace period, allowing the client
the ability to reclaim its locks, in a fashion similar to what the ability to reclaim its locks, in a fashion similar to what
would have been done in the case of recovery from a server would have been done in the case of recovery from a server
restart. See Section 11.13.1 for a more complete discussion. restart. See Section 11.14.1 for a more complete discussion.
o By implementing Transparent State Migration possibly in connection o By implementing Transparent State Migration possibly in connection
with session migration, the server can provide the client with session migration, the server can provide the client
immediate access to the state built up on the source server, on immediate access to the state built up on the source server, on
the destination. the destination.
These features are discussed separately in Sections 11.13.2 and These features are discussed separately in Sections 11.14.2 and
11.13.3, which discuss Transparent State Migration and session 11.14.3, which discuss Transparent State Migration and session
migration respectively. migration respectively.
All the features described above can involve transfer of lock-related All the features described above can involve transfer of lock-related
information between source and destination servers. In some cases, information between source and destination servers. In some cases,
this transfer is a necessary part of the implementation while in this transfer is a necessary part of the implementation while in
other cases it is a helpful implementation aid which servers might or other cases it is a helpful implementation aid which servers might or
might not use. The sub-sections below discuss the information which might not use. The sub-sections below discuss the information which
would be transferred but do not define the specifics of the transfer would be transferred but do not define the specifics of the transfer
protocol. This is left as an implementation choice although protocol. This is left as an implementation choice although
standards in this area could be developed at a later time. standards in this area could be developed at a later time.
11.13.1. Server Responsibilities in Effecting State Reclaim after 11.14.1. Server Responsibilities in Effecting State Reclaim after
Migration Migration
In this case, destination server need have no knowledge of the locks In this case, the destination server needs no knowledge of the locks
held on the source server, but relies on the clients to accurately held on the source server. It relies on the clients to accurately
report (via reclaim operations) the locks previously held, not report (via reclaim operations) the locks previously held, and does
allowing new locks to be granted on migrated file system until the not allow new locks to be granted on migrated file systems until the
grace period expires. grace period expires. Disallowing of new locks applies to all
clients accessing these file system, while grace period expiration
occurs for each migrated client independently.
During this grace period clients have the opportunity to use reclaim During this grace period clients have the opportunity to use reclaim
operations to obtain locks for file system objects within the operations to obtain locks for file system objects within the
migrated file system, in the same way that they do when recovering migrated file system, in the same way that they do when recovering
from server restart, and the servers typically rely on clients to from server restart, and the servers typically rely on clients to
accurately report their locks, although they have the option of accurately report their locks, although they have the option of
subjecting these requests to verification. If the clients only subjecting these requests to verification. If the clients only
reclaim locks held on the source server, no conflict can arise. Once reclaim locks held on the source server, no conflict can arise. Once
the client has reclaimed its locks, it indicates the completion of the client has reclaimed its locks, it indicates the completion of
lock reclamation by performing a RECLAIM_COMPLETE specifying lock reclamation by performing a RECLAIM_COMPLETE specifying
skipping to change at page 266, line 10 skipping to change at page 273, line 5
for the transferred file system is made available, the destination for the transferred file system is made available, the destination
server will be able to terminate the grace period once all such server will be able to terminate the grace period once all such
clients have reclaimed their locks, allowing normal locking clients have reclaimed their locks, allowing normal locking
activity to resume earlier than it would have otherwise. activity to resume earlier than it would have otherwise.
o Locking summary information for individual clients (at various o Locking summary information for individual clients (at various
possible levels of detail) can detect some instances in which possible levels of detail) can detect some instances in which
clients do not accurately represent the locks held on the source clients do not accurately represent the locks held on the source
server. server.
11.13.2. Server Responsibilities in Effecting Transparent State 11.14.2. Server Responsibilities in Effecting Transparent State
Migration Migration
The basic responsibility of the source server in effecting The basic responsibility of the source server in effecting
Transparent State Migration is to make available to the destination Transparent State Migration is to make available to the destination
server a description of each piece of locking state associated with server a description of each piece of locking state associated with
the file system being migrated. In addition to client id string and the file system being migrated. In addition to client id string and
verifier, the source server needs to provide, for each stateid: verifier, the source server needs to provide, for each stateid:
o The stateid including the current sequence value. o The stateid including the current sequence value.
skipping to change at page 266, line 42 skipping to change at page 273, line 37
needs to be included. needs to be included.
o For each lock type, there will be type-specific information, such o For each lock type, there will be type-specific information, such
as share and deny modes for opens and type and byte ranges for as share and deny modes for opens and type and byte ranges for
byte-range locks and layouts. byte-range locks and layouts.
Such information will most probably be organized by client id string Such information will most probably be organized by client id string
on the destination server so that it can be used to provide on the destination server so that it can be used to provide
appropriate context to each client when it makes itself known to the appropriate context to each client when it makes itself known to the
client. Issues connected with a client impersonating another by client. Issues connected with a client impersonating another by
presenting another client's id string are discussed in Section 21. presenting another client's client id string can be addressed using
NFSv4.1 state protection features, as described in Section 21.
A further server responsibility concerns locks that are revoked or A further server responsibility concerns locks that are revoked or
otherwise lost during the process of file system migration. Because otherwise lost during the process of file system migration. Because
locks that appear to be lost during the process of migration will be locks that appear to be lost during the process of migration will be
reclaimed by the client, the servers have to take steps to ensure reclaimed by the client, the servers have to take steps to ensure
that locks revoked soon before or soon after migration are not that locks revoked soon before or soon after migration are not
inadvertently allowed to be reclaimed in situations in which the inadvertently allowed to be reclaimed in situations in which the
continuity of lock possession cannot be assured. continuity of lock possession cannot be assured.
o For locks lost on the source but whose loss has not yet been o For locks lost on the source but whose loss has not yet been
skipping to change at page 267, line 32 skipping to change at page 274, line 27
granted until the client does a RECLAIM_COMPLETE, after reclaiming granted until the client does a RECLAIM_COMPLETE, after reclaiming
the locks it had, with the exception of reclaims denied because the locks it had, with the exception of reclaims denied because
they were attempts to reclaim locks that had been lost. they were attempts to reclaim locks that had been lost.
o Implement Transparent State Migration, except for the lock with o Implement Transparent State Migration, except for the lock with
the conflicting stateid. In this case, the client will be aware the conflicting stateid. In this case, the client will be aware
of a lost lock (through the SEQ4_STATUS flags) and be allowed to of a lost lock (through the SEQ4_STATUS flags) and be allowed to
reclaim it. reclaim it.
When transferring state between the source and destination, the When transferring state between the source and destination, the
issues discussed in Section 7.2 of [65] must still be attended to. issues discussed in Section 7.2 of [68] must still be attended to.
In this case, the use of NFS4ERR_DELAY may still necessary in In this case, the use of NFS4ERR_DELAY may still necessary in
NFSv4.1, as it was in NFSv4.0, to prevent locking state changing NFSv4.1, as it was in NFSv4.0, to prevent locking state changing
while it is being transferred. See Section 15.1.1.3 for information while it is being transferred. See Section 15.1.1.3 for information
about appropriate client retry approaches in the event that about appropriate client retry approaches in the event that
NFS4ERR_DELAY is returned. NFS4ERR_DELAY is returned.
There are a number of important differences in the NFS4.1 context: There are a number of important differences in the NFS4.1 context:
o The absence of RELEASE_LOCKOWNER means that the one case in which o The absence of RELEASE_LOCKOWNER means that the one case in which
an operation could not be deferred by use of NFS4ERR_DELAY no an operation could not be deferred by use of NFS4ERR_DELAY no
longer exists. longer exists.
o Sequencing of operations is no longer done using owner-based o Sequencing of operations is no longer done using owner-based
operation sequences numbers. Instead, sequencing is session- operation sequences numbers. Instead, sequencing is session-
based based
As a result, when sessions are not transferred, the techniques As a result, when sessions are not transferred, the techniques
discussed in Section 7.2 of [65] are adequate and will not be further discussed in Section 7.2 of [68] are adequate and will not be further
discussed. discussed.
11.13.3. Server Responsibilities in Effecting Session Transfer 11.14.3. Server Responsibilities in Effecting Session Transfer
The basic responsibility of the source server in effecting session The basic responsibility of the source server in effecting session
transfer is to make available to the destination server a description transfer is to make available to the destination server a description
of the current state of each slot with the session, including: of the current state of each slot with the session, including:
o The last sequence value received for that slot. o The last sequence value received for that slot.
o Whether there is cached reply data for the last request executed o Whether there is cached reply data for the last request executed
and, if so, the cached reply. and, if so, the cached reply.
skipping to change at page 269, line 45 skipping to change at page 276, line 39
An important issue is that the specification needs to take note of An important issue is that the specification needs to take note of
all potential COMPOUNDs, even if they might be unlikely in practice. all potential COMPOUNDs, even if they might be unlikely in practice.
For example, a COMPOUND is allowed to access multiple file systems For example, a COMPOUND is allowed to access multiple file systems
and might perform non-idempotent operations in some of them before and might perform non-idempotent operations in some of them before
accessing a file system being migrated. Also, a COMPOUND may return accessing a file system being migrated. Also, a COMPOUND may return
considerable data in the response, before being rejected with considerable data in the response, before being rejected with
NFS4ERR_DELAY or NFS4ERR_MOVED, and may in addition be marked as NFS4ERR_DELAY or NFS4ERR_MOVED, and may in addition be marked as
sa_cachethis. However, note that if the client and server adhere to sa_cachethis. However, note that if the client and server adhere to
rules in Section 15.1.1.3, there is no possibility of non-idempotent rules in Section 15.1.1.3, there is no possibility of non-idempotent
operations being spuriouly reissued after receiving NFS4ERR_DELAY operations being spuriously reissued after receiving NFS4ERR_DELAY
response. response.
To address these issues, a destination server MAY do any of the To address these issues, a destination server MAY do any of the
following when implementing session transfer. following when implementing session transfer.
o Avoid enforcing any sequencing semantics for a particular slot o Avoid enforcing any sequencing semantics for a particular slot
until the client has established the starting sequence for that until the client has established the starting sequence for that
slot on the destination server. slot on the destination server.
o For each slot, avoid returning a cached reply returning o For each slot, avoid returning a cached reply returning
skipping to change at page 270, line 29 skipping to change at page 277, line 21
Because of the considerations mentioned above including the rules for Because of the considerations mentioned above including the rules for
the handling of NFS4ERR_DELAY included in Section 15.1.1.3, the the handling of NFS4ERR_DELAY included in Section 15.1.1.3, the
destination server can respond appropriately to SEQUENCE operations destination server can respond appropriately to SEQUENCE operations
received from the client by adopting the three policies listed below: received from the client by adopting the three policies listed below:
o Not responding with NFS4ERR_SEQ_MISORDERED for the initial request o Not responding with NFS4ERR_SEQ_MISORDERED for the initial request
on a slot within a transferred session, since the destination on a slot within a transferred session, since the destination
server cannot be aware of requests made by the client after the server cannot be aware of requests made by the client after the
server handoff but before the client became aware of the shift. server handoff but before the client became aware of the shift.
In cases in which NFS4ERR_SEQ_MISORDERED would normally have been
reported, the request is to be processed normally, as a new
request.
o Replying as it would for a retry whenever the sequence matches o Replying as it would for a retry whenever the sequence matches
that transferred by the source server, even though this would not that transferred by the source server, even though this would not
provide retry handling for requests issued after the server provide retry handling for requests issued after the server
handoff, under the assumption that when such requests are issued handoff, under the assumption that when such requests are issued
they will never be responded to in a state-changing fashion, they will never be responded to in a state-changing fashion,
making retry support for them unnecessary. making retry support for them unnecessary.
o Once a non-retry SEQUENCE is received for a given slot, using that o Once a non-retry SEQUENCE is received for a given slot, using that
as the basis for further sequence checking, with no further as the basis for further sequence checking, with no further
reference to the sequence value transferred by the source. reference to the sequence value transferred by the source.
server. server.
11.14. Effecting File System Referrals 11.15. Effecting File System Referrals
Referrals are effected when an absent file system is encountered and Referrals are effected when an absent file system is encountered and
one or more alternate locations are made available by the one or more alternate locations are made available by the
fs_locations or fs_locations_info attributes. The client will fs_locations or fs_locations_info attributes. The client will
typically get an NFS4ERR_MOVED error, fetch the appropriate location typically get an NFS4ERR_MOVED error, fetch the appropriate location
information, and proceed to access the file system on a different information, and proceed to access the file system on a different
server, even though it retains its logical position within the server, even though it retains its logical position within the
original namespace. Referrals differ from migration events in that original namespace. Referrals differ from migration events in that
they happen only when the client has not previously referenced the they happen only when the client has not previously referenced the
file system in question (so there is nothing to transition). file system in question (so there is nothing to transition).
skipping to change at page 271, line 16 skipping to change at page 278, line 10
encountered at its root. encountered at its root.
The examples given in the sections below are somewhat artificial in The examples given in the sections below are somewhat artificial in
that an actual client will not typically do a multi-component look that an actual client will not typically do a multi-component look
up, but will have cached information regarding the upper levels of up, but will have cached information regarding the upper levels of
the name hierarchy. However, these examples are chosen to make the the name hierarchy. However, these examples are chosen to make the
required behavior clear and easy to put within the scope of a small required behavior clear and easy to put within the scope of a small
number of requests, without getting a discussion of the details of number of requests, without getting a discussion of the details of
how specific clients might choose to cache things. how specific clients might choose to cache things.
11.14.1. Referral Example (LOOKUP) 11.15.1. Referral Example (LOOKUP)
Let us suppose that the following COMPOUND is sent in an environment Let us suppose that the following COMPOUND is sent in an environment
in which /this/is/the/path is absent from the target server. This in which /this/is/the/path is absent from the target server. This
may be for a number of reasons. It may be that the file system has may be for a number of reasons. It may be that the file system has
moved, or it may be that the target server is functioning mainly, or moved, or it may be that the target server is functioning mainly, or
solely, to refer clients to the servers on which various file systems solely, to refer clients to the servers on which various file systems
are located. are located.
o PUTROOTFH o PUTROOTFH
skipping to change at page 274, line 46 skipping to change at page 281, line 42
occurred (between "the" and "path"). The fs_locations_info attribute occurred (between "the" and "path"). The fs_locations_info attribute
also gives the client the actual location of the absent file system, also gives the client the actual location of the absent file system,
so that the referral can proceed. The server gives the client the so that the referral can proceed. The server gives the client the
bare minimum of information about the absent file system so that bare minimum of information about the absent file system so that
there will be very little scope for problems of conflict between there will be very little scope for problems of conflict between
information sent by the referring server and information of the file information sent by the referring server and information of the file
system's home. No filehandles and very few attributes are present on system's home. No filehandles and very few attributes are present on
the referring server, and the client can treat those it receives as the referring server, and the client can treat those it receives as
transient information with the function of enabling the referral. transient information with the function of enabling the referral.
11.14.2. Referral Example (READDIR) 11.15.2. Referral Example (READDIR)
Another context in which a client may encounter referrals is when it Another context in which a client may encounter referrals is when it
does a READDIR on a directory in which some of the sub-directories does a READDIR on a directory in which some of the sub-directories
are the roots of absent file systems. are the roots of absent file systems.
Suppose such a directory is read as follows: Suppose such a directory is read as follows:
o PUTROOTFH o PUTROOTFH
o LOOKUP "this" o LOOKUP "this"
skipping to change at page 276, line 24 skipping to change at page 283, line 18
o LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the and o LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the and
is within the pseudo-fs. is within the pseudo-fs.
o READDIR (rdattr_error, fsid, size, time_modify, mounted_on_fileid) o READDIR (rdattr_error, fsid, size, time_modify, mounted_on_fileid)
--> NFS_OK. The attributes for directory entry with the component --> NFS_OK. The attributes for directory entry with the component
named "path" will only contain rdattr_error with the value named "path" will only contain rdattr_error with the value
NFS4ERR_MOVED, together with an fsid value and a value for NFS4ERR_MOVED, together with an fsid value and a value for
mounted_on_fileid. mounted_on_fileid.
Suppose we do another READDIR to get fs_locations_info (although we Suppose we do another READDIR to get fs_locations_info (although we
could have used a GETATTR directly, as in Section 11.14.1). could have used a GETATTR directly, as in Section 11.15.1).
o PUTROOTFH o PUTROOTFH
o LOOKUP "this" o LOOKUP "this"
o LOOKUP "is" o LOOKUP "is"
o LOOKUP "the" o LOOKUP "the"
o READDIR (rdattr_error, fs_locations_info, mounted_on_fileid, fsid, o READDIR (rdattr_error, fs_locations_info, mounted_on_fileid, fsid,
skipping to change at page 277, line 25 skipping to change at page 284, line 15
o mounted_on_fileid (value: unique fileid within referring file o mounted_on_fileid (value: unique fileid within referring file
system) system)
o fsid (value: unique value within referring server) o fsid (value: unique value within referring server)
The attributes for entry "path" will not contain size or time_modify The attributes for entry "path" will not contain size or time_modify
because these attributes are not available within an absent file because these attributes are not available within an absent file
system. system.
11.15. The Attribute fs_locations 11.16. The Attribute fs_locations
The fs_locations attribute is structured in the following way: The fs_locations attribute is structured in the following way:
struct fs_location4 { struct fs_location4 {
utf8str_cis server<>; utf8str_cis server<>;
pathname4 rootpath; pathname4 rootpath;
}; };
struct fs_locations4 { struct fs_locations4 {
pathname4 fs_root; pathname4 fs_root;
skipping to change at page 280, line 28 skipping to change at page 287, line 17
The specific choices reflect typical implementation patterns for The specific choices reflect typical implementation patterns for
failover and controlled migration, respectively. Since other choices failover and controlled migration, respectively. Since other choices
are possible and useful, this information is better obtained by using are possible and useful, this information is better obtained by using
fs_locations_info. When a server implementation needs to communicate fs_locations_info. When a server implementation needs to communicate
other choices, it MUST support the fs_locations_info attribute. other choices, it MUST support the fs_locations_info attribute.
See Section 21 for a discussion on the recommendations for the See Section 21 for a discussion on the recommendations for the
security flavor to be used by any GETATTR operation that requests the security flavor to be used by any GETATTR operation that requests the
"fs_locations" attribute. "fs_locations" attribute.
11.16. The Attribute fs_locations_info 11.17. The Attribute fs_locations_info
The fs_locations_info attribute is intended as a more functional The fs_locations_info attribute is intended as a more functional
replacement for the fs_locations attribute which will continue to replacement for the fs_locations attribute which will continue to
exist and be supported. Clients can use it to get a more complete exist and be supported. Clients can use it to get a more complete
set of data about alternative file system locations, including set of data about alternative file system locations, including
additional network paths to access replicas in use and additional additional network paths to access replicas in use and additional
replicas. When the server does not support fs_locations_info, replicas. When the server does not support fs_locations_info,
fs_locations can be used to get a subset of the data. A server that fs_locations can be used to get a subset of the data. A server that
supports fs_locations_info MUST support fs_locations as well. supports fs_locations_info MUST support fs_locations as well.
skipping to change at page 284, line 11 skipping to change at page 290, line 46
just referenced) and its successor location. Servers are strongly just referenced) and its successor location. Servers are strongly
urged to support this attribute on all file systems if they support urged to support this attribute on all file systems if they support
it on any file system. it on any file system.
The data presented in the fs_locations_info attribute may be obtained The data presented in the fs_locations_info attribute may be obtained
by the server in any number of ways, including specification by the by the server in any number of ways, including specification by the
administrator or by current protocols for transferring data among administrator or by current protocols for transferring data among
replicas and protocols not yet developed. NFSv4.1 only defines how replicas and protocols not yet developed. NFSv4.1 only defines how
this information is presented by the server to the client. this information is presented by the server to the client.
11.16.1. The fs_locations_server4 Structure 11.17.1. The fs_locations_server4 Structure
The fs_locations_server4 structure consists of the following items in The fs_locations_server4 structure consists of the following items in
addition to the fls_server field which specifies a network address or addition to the fls_server field which specifies a network address or
set of addresses to be used to access the specified file system. set of addresses to be used to access the specified file system.
Note that both of these items (i.e., fls_currency and flinfo) specify Note that both of these items (i.e., fls_currency and flinfo) specify
attributes of the file system replica and should not be different attributes of the file system replica and should not be different
when there are multiple fs_locations_server4 structures for the same when there are multiple fs_locations_server4 structures for the same
replica, each specifying a network path to the chosen replica. replica, each specifying a network path to the chosen replica.
When these values are different in two fs_locations_server4 When these values are different in two fs_locations_server4
skipping to change at page 285, line 6 skipping to change at page 291, line 42
information about the particular file system instance. This data information about the particular file system instance. This data
includes general flags, transport capability flags, file system includes general flags, transport capability flags, file system
equivalence class information, and selection priority information. equivalence class information, and selection priority information.
The encoding will be discussed below. The encoding will be discussed below.
o The server string (fls_server). For the case of the replica o The server string (fls_server). For the case of the replica
currently being accessed (via GETATTR), a zero-length string MAY currently being accessed (via GETATTR), a zero-length string MAY
be used to indicate the current address being used for the RPC be used to indicate the current address being used for the RPC
call. The fls_server field can also be an IPv4 or IPv6 address, call. The fls_server field can also be an IPv4 or IPv6 address,
formatted the same way as an IPv4 or IPv6 address in the "server" formatted the same way as an IPv4 or IPv6 address in the "server"
field of the fs_location4 data type (see Section 11.15). field of the fs_location4 data type (see Section 11.16).
With the exception of the transport-flag field (at offset With the exception of the transport-flag field (at offset
FSLI4BX_TFLAGS with the fls_info array), all of this data applies to FSLI4BX_TFLAGS with the fls_info array), all of this data defined in
the replica specified by the entry, rather that the specific network this specification applies to the replica specified by the entry,
path used to access it. rather that the specific network path used to access it. The
classification of data in extensions to this data is discussed below.
Data within the fls_info array is in the form of 8-bit data items Data within the fls_info array is in the form of 8-bit data items
with constants giving the offsets within the array of various values with constants giving the offsets within the array of various values
describing this particular file system instance. This style of describing this particular file system instance. This style of
definition was chosen, in preference to explicit XDR structure definition was chosen, in preference to explicit XDR structure
definitions for these values, for a number of reasons. definitions for these values, for a number of reasons.
o The kinds of data in the fls_info array, representing flags, file o The kinds of data in the fls_info array, representing flags, file
system classes, and priorities among sets of file systems system classes, and priorities among sets of file systems
representing the same data, are such that 8 bits provide a quite representing the same data, are such that 8 bits provide a quite
acceptable range of values. Even where there might be more than acceptable range of values. Even where there might be more than
256 such file system instances, having more than 256 distinct 256 such file system instances, having more than 256 distinct
classes or priorities is unlikely. classes or priorities is unlikely.
o Explicit definition of the various specific data items within XDR o Explicit definition of the various specific data items within XDR
would limit expandability in that any extension within would would limit expandability in that any extension within would
require yet another attribute, leading to specification and require yet another attribute, leading to specification and
implementation clumsiness. In the context of the NFSv4 extension implementation clumsiness. In the context of the NFSv4 extension
model in effect at the time fs_locations_info was designed (i.e. model in effect at the time fs_locations_info was designed (i.e.
that described in RFC5661 [62]), this would necessitate a new that described in RFC5661 [65]), this would necessitate a new
minor version to effect any Standards Track extension to the data minor version to effect any Standards Track extension to the data
in in fls_info. in in fls_info.
The set of fls_info data is subject to expansion in a future minor The set of fls_info data is subject to expansion in a future minor
version, or in a Standards Track RFC, within the context of a single version, or in a Standards Track RFC, within the context of a single
minor version. The server SHOULD NOT send and the client MUST NOT minor version. The server SHOULD NOT send and the client MUST NOT
use indices within the fls_info array or flag bits that are not use indices within the fls_info array or flag bits that are not
defined in Standards Track RFCs. defined in Standards Track RFCs.
In light of the new extension model defined in RFC8178 [63] and the In light of the new extension model defined in RFC8178 [66] and the
fact that the individual items within fls_info are not explicitly fact that the individual items within fls_info are not explicitly
referenced in the XDR, the following practices should be followed referenced in the XDR, the following practices should be followed
when extending or otherwise changing the structure of the data when extending or otherwise changing the structure of the data
returned in fls_info within the scope of a single minor version. returned in fls_info within the scope of a single minor version.
o All extensions need to be described by Standards Track documents. o All extensions need to be described by Standards Track documents.
There is no need for such documents to be marked as updating There is no need for such documents to be marked as updating
RFC5661 [62] or this document. RFC5661 [65] or this document.
o It needs to be made clear whether the information in any added o It needs to be made clear whether the information in any added
data items applies to the replica specified by the entry or to the data items applies to the replica specified by the entry or to the
specific network paths specified in the entry. specific network paths specified in the entry.
o There needs to be a reliable way defined to determine whether the o There needs to be a reliable way defined to determine whether the
server is aware of the extension. This may be based on the length server is aware of the extension. This may be based on the length
field of the fls_info array, but it is more flexible to provide field of the fls_info array, but it is more flexible to provide
fs-scope or server-scope attributes to indicate what extensions fs-scope or server-scope attributes to indicate what extensions
are provided. are provided.
skipping to change at page 286, line 45 skipping to change at page 293, line 32
The general file system characteristics flag (at byte index The general file system characteristics flag (at byte index
FSLI4BX_GFLAGS) has the following bits defined within it: FSLI4BX_GFLAGS) has the following bits defined within it:
o FSLI4GF_WRITABLE indicates that this file system target is o FSLI4GF_WRITABLE indicates that this file system target is
writable, allowing it to be selected by clients that may need to writable, allowing it to be selected by clients that may need to
write on this file system. When the current file system instance write on this file system. When the current file system instance
is writable and is defined as of the same simultaneous use class is writable and is defined as of the same simultaneous use class
(as specified by the value at index FSLI4BX_CLSIMUL) to which the (as specified by the value at index FSLI4BX_CLSIMUL) to which the
client was previously writing, then it must incorporate within its client was previously writing, then it must incorporate within its
data any committed write made on the source file system instance. data any committed write made on the source file system instance.
See Section 11.10.6, which discusses the write-verifier class. See Section 11.11.6, which discusses the write-verifier class.
While there is no harm in not setting this flag for a file system While there is no harm in not setting this flag for a file system
that turns out to be writable, turning the flag on for a read-only that turns out to be writable, turning the flag on for a read-only
file system can cause problems for clients that select a migration file system can cause problems for clients that select a migration
or replication target based on the flag and then find themselves or replication target based on the flag and then find themselves
unable to write. unable to write.
o FSLI4GF_CUR_REQ indicates that this replica is the one on which o FSLI4GF_CUR_REQ indicates that this replica is the one on which
the request is being made. Only a single server entry may have the request is being made. Only a single server entry may have
this flag set and, in the case of a referral, no entry will have this flag set and, in the case of a referral, no entry will have
it set. Note that this flag might be set even if the request was it set. Note that this flag might be set even if the request was
skipping to change at page 289, line 19 skipping to change at page 296, line 4
system has an 8-bit class number. Two file systems belong to the system has an 8-bit class number. Two file systems belong to the
same class if both have identical non-zero class numbers. Zero is same class if both have identical non-zero class numbers. Zero is
treated as non-matching. Most often, the relevant question for the treated as non-matching. Most often, the relevant question for the
client will be whether a given replica is identical to / continuous client will be whether a given replica is identical to / continuous
with the current one in a given respect, but the information should with the current one in a given respect, but the information should
be available also as to whether two other replicas match in that be available also as to whether two other replicas match in that
respect as well. respect as well.
The following fields specify the file system's class numbers for the The following fields specify the file system's class numbers for the
equivalence relations used in determining the nature of file system equivalence relations used in determining the nature of file system
transitions. See Sections 11.8 through 11.13 and their various transitions. See Sections 11.9 through 11.14 and their various
subsections for details about how this information is to be used. subsections for details about how this information is to be used.
Servers may assign these values as they wish, so long as file system Servers may assign these values as they wish, so long as file system
instances that share the same value have the specified relationship instances that share the same value have the specified relationship
to one another; conversely, file systems that have the specified to one another; conversely, file systems that have the specified
relationship to one another share a common class value. As each relationship to one another share a common class value. As each
instance entry is added, the relationships of this instance to instance entry is added, the relationships of this instance to
previously entered instances can be consulted, and if one is found previously entered instances can be consulted, and if one is found
that bears the specified relationship, that entry's class value can that bears the specified relationship, that entry's class value can
be copied to the new entry. When no such previous entry exists, a be copied to the new entry. When no such previous entry exists, a
new value for that byte index (not previously used) can be selected, new value for that byte index (not previously used) can be selected,
skipping to change at page 290, line 47 skipping to change at page 297, line 33
o The field at byte index FSLI4BX_WRITEORDER gives the order value o The field at byte index FSLI4BX_WRITEORDER gives the order value
to be used for writable access. to be used for writable access.
Depending on the potential need for write access by a given client, Depending on the potential need for write access by a given client,
one of the pairs of rank and order values is used. The read rank and one of the pairs of rank and order values is used. The read rank and
order should only be used if the client knows that only reading will order should only be used if the client knows that only reading will
ever be done or if it is prepared to switch to a different replica in ever be done or if it is prepared to switch to a different replica in
the event that any write access capability is required in the future. the event that any write access capability is required in the future.
11.16.2. The fs_locations_info4 Structure 11.17.2. The fs_locations_info4 Structure
The fs_locations_info4 structure, encoding the fs_locations_info The fs_locations_info4 structure, encoding the fs_locations_info
attribute, contains the following: attribute, contains the following:
o The fli_flags field, which contains general flags that affect the o The fli_flags field, which contains general flags that affect the
interpretation of this fs_locations_info4 structure and all interpretation of this fs_locations_info4 structure and all
fs_locations_item4 structures within it. The only flag currently fs_locations_item4 structures within it. The only flag currently
defined is FSLI4IF_VAR_SUB. All bits in the fli_flags field that defined is FSLI4IF_VAR_SUB. All bits in the fli_flags field that
are not defined should always be returned as zero. are not defined should always be returned as zero.
skipping to change at page 292, line 6 skipping to change at page 298, line 38
information from all component entries to be refetched, the server information from all component entries to be refetched, the server
will typically provide a low value for this field if any of the will typically provide a low value for this field if any of the
replicas are likely to go out of service in a short time frame. replicas are likely to go out of service in a short time frame.
Note that, because of the ability of the server to return Note that, because of the ability of the server to return
NFS4ERR_MOVED to trigger the use of different paths, when NFS4ERR_MOVED to trigger the use of different paths, when
alternate trunked paths are available, there is generally no need alternate trunked paths are available, there is generally no need
to use low values of fli_valid_for in connection with the to use low values of fli_valid_for in connection with the
management of alternate paths to the same replica. management of alternate paths to the same replica.
The FSLI4IF_VAR_SUB flag within fli_flags controls whether variable The FSLI4IF_VAR_SUB flag within fli_flags controls whether variable
substitution is to be enabled. See Section 11.16.3 for an substitution is to be enabled. See Section 11.17.3 for an
explanation of variable substitution. explanation of variable substitution.
11.16.3. The fs_locations_item4 Structure 11.17.3. The fs_locations_item4 Structure
The fs_locations_item4 structure contains a pathname (in the field The fs_locations_item4 structure contains a pathname (in the field
fli_rootpath) that encodes the path of the target file system fli_rootpath) that encodes the path of the target file system
replicas on the set of servers designated by the included replicas on the set of servers designated by the included
fs_locations_server4 entries. The precise manner in which this fs_locations_server4 entries. The precise manner in which this
target location is specified depends on the value of the target location is specified depends on the value of the
FSLI4IF_VAR_SUB flag within the associated fs_locations_info4 FSLI4IF_VAR_SUB flag within the associated fs_locations_info4
structure. structure.
If this flag is not set, then fli_rootpath simply designates the If this flag is not set, then fli_rootpath simply designates the
skipping to change at page 294, line 5 skipping to change at page 300, line 38
substituted variables, the result is always a valid successor file substituted variables, the result is always a valid successor file
system instance to that from which a transition is occurring, i.e., system instance to that from which a transition is occurring, i.e.,
that the data is identical or represents a later image of a writable that the data is identical or represents a later image of a writable
file system. file system.
Note that when fli_rootpath is a null pathname (that is, one with Note that when fli_rootpath is a null pathname (that is, one with
zero components), the file system designated is at the root of the zero components), the file system designated is at the root of the
specified server, whether or not the FSLI4IF_VAR_SUB flag within the specified server, whether or not the FSLI4IF_VAR_SUB flag within the
associated fs_locations_info4 structure is set. associated fs_locations_info4 structure is set.
11.17. The Attribute fs_status 11.18. The Attribute fs_status
In an environment in which multiple copies of the same basic set of In an environment in which multiple copies of the same basic set of
data are available, information regarding the particular source of data are available, information regarding the particular source of
such data and the relationships among different copies can be very such data and the relationships among different copies can be very
helpful in providing consistent data to applications. helpful in providing consistent data to applications.
enum fs4_status_type { enum fs4_status_type {
STATUS4_FIXED = 1, STATUS4_FIXED = 1,
STATUS4_UPDATED = 2, STATUS4_UPDATED = 2,
STATUS4_VERSIONED = 3, STATUS4_VERSIONED = 3,
skipping to change at page 296, line 32 skipping to change at page 303, line 25
The opaque string fss_current should provide whatever information is The opaque string fss_current should provide whatever information is
available about the source of the current copy. Such information available about the source of the current copy. Such information
includes the tool creating it, any relevant parameters to that tool, includes the tool creating it, any relevant parameters to that tool,
the time at which the copy was done, the user making the change, the the time at which the copy was done, the user making the change, the
server on which the change was made, etc. All information should be server on which the change was made, etc. All information should be
in a human-readable string. in a human-readable string.
The field fss_age provides an indication of how out-of-date the file The field fss_age provides an indication of how out-of-date the file
system currently is with respect to its ultimate data source (in case system currently is with respect to its ultimate data source (in case
of cascading data updates). This complements the fls_currency field of cascading data updates). This complements the fls_currency field
of fs_locations_server4 (see Section 11.16) in the following way: the of fs_locations_server4 (see Section 11.17) in the following way: the
information in fls_currency gives a bound for how out of date the information in fls_currency gives a bound for how out of date the
data in a file system might typically get, while the value in fss_age data in a file system might typically get, while the value in fss_age
gives a bound on how out-of-date that data actually is. Negative gives a bound on how out-of-date that data actually is. Negative
values imply that no information is available. A zero means that values imply that no information is available. A zero means that
this data is known to be current. A positive value means that this this data is known to be current. A positive value means that this
data is known to be no older than that number of seconds with respect data is known to be no older than that number of seconds with respect
to the ultimate data source. Using this value, the client may be to the ultimate data source. Using this value, the client may be
able to decide that a data copy is too old, so that it may search for able to decide that a data copy is too old, so that it may search for
a newer version to use. a newer version to use.
skipping to change at page 300, line 23 skipping to change at page 307, line 15
12.2.5. Storage Protocol 12.2.5. Storage Protocol
As noted in Figure 1, the storage protocol is the method used by the As noted in Figure 1, the storage protocol is the method used by the
client to store and retrieve data directly from the storage devices. client to store and retrieve data directly from the storage devices.
The NFSv4.1 pNFS feature has been structured to allow for a variety The NFSv4.1 pNFS feature has been structured to allow for a variety
of storage protocols to be defined and used. One example storage of storage protocols to be defined and used. One example storage
protocol is NFSv4.1 itself (as documented in Section 13). Other protocol is NFSv4.1 itself (as documented in Section 13). Other
options for the storage protocol are described elsewhere and include: options for the storage protocol are described elsewhere and include:
o Block/volume protocols such as Internet SCSI (iSCSI) [51] and FCP o Block/volume protocols such as Internet SCSI (iSCSI) [55] and FCP
[52]. The block/volume protocol support can be independent of the [56]. The block/volume protocol support can be independent of the
addressing structure of the block/volume protocol used, allowing addressing structure of the block/volume protocol used, allowing
more than one protocol to access the same file data and enabling more than one protocol to access the same file data and enabling
extensibility to other block/volume protocols. See [44] for a extensibility to other block/volume protocols. See [47] for a
layout specification that allows pNFS to use block/volume storage layout specification that allows pNFS to use block/volume storage
protocols. protocols.
o Object protocols such as OSD over iSCSI or Fibre Channel [53]. o Object protocols such as OSD over iSCSI or Fibre Channel [57].
See [43] for a layout specification that allows pNFS to use object See [46] for a layout specification that allows pNFS to use object
storage protocols. storage protocols.
It is possible that various storage protocols are available to both It is possible that various storage protocols are available to both
client and server and it may be possible that a client and server do client and server and it may be possible that a client and server do
not have a matching storage protocol available to them. Because of not have a matching storage protocol available to them. Because of
this, the pNFS server MUST support normal NFSv4.1 access to any file this, the pNFS server MUST support normal NFSv4.1 access to any file
accessible by the pNFS feature; this will allow for continued accessible by the pNFS feature; this will allow for continued
interoperability between an NFSv4.1 client and server. interoperability between an NFSv4.1 client and server.
12.2.6. Control Protocol 12.2.6. Control Protocol
skipping to change at page 301, line 11 skipping to change at page 307, line 51
state required by the storage devices to perform client access state required by the storage devices to perform client access
control, and, depending on the storage protocol, the enforcement of control, and, depending on the storage protocol, the enforcement of
authentication and authorization so that restrictions that would be authentication and authorization so that restrictions that would be
enforced by the metadata server are also enforced by the storage enforced by the metadata server are also enforced by the storage
device. device.
A particular control protocol is not REQUIRED by NFSv4.1 but A particular control protocol is not REQUIRED by NFSv4.1 but
requirements are placed on the control protocol for maintaining requirements are placed on the control protocol for maintaining
attributes like modify time, the change attribute, and the end-of- attributes like modify time, the change attribute, and the end-of-
file (EOF) position. Note that if pNFS is layered over a clustered, file (EOF) position. Note that if pNFS is layered over a clustered,
parallel file system (e.g., PVFS [54]), the mechanisms that enable parallel file system (e.g., PVFS [58]), the mechanisms that enable
clustering and parallelism in that file system can be considered the clustering and parallelism in that file system can be considered the
control protocol. control protocol.
12.2.7. Layout Types 12.2.7. Layout Types
A layout describes the mapping of a file's data to the storage A layout describes the mapping of a file's data to the storage
devices that hold the data. A layout is said to belong to a specific devices that hold the data. A layout is said to belong to a specific
layout type (data type layouttype4, see Section 3.3.13). The layout layout type (data type layouttype4, see Section 3.3.13). The layout
type allows for variants to handle different storage protocols, such type allows for variants to handle different storage protocols, such
as those associated with block/volume [44], object [43], and file as those associated with block/volume [47], object [46], and file
(Section 13) layout types. A metadata server, along with its control (Section 13) layout types. A metadata server, along with its control
protocol, MUST support at least one layout type. A private sub-range protocol, MUST support at least one layout type. A private sub-range
of the layout type namespace is also defined. Values from the of the layout type namespace is also defined. Values from the
private layout type range MAY be used for internal testing or private layout type range MAY be used for internal testing or
experimentation (see Section 3.3.13). experimentation (see Section 3.3.13).
As an example, the organization of the file layout type could be an As an example, the organization of the file layout type could be an
array of tuples (e.g., device ID, filehandle), along with a array of tuples (e.g., device ID, filehandle), along with a
definition of how the data is stored across the devices (e.g., definition of how the data is stored across the devices (e.g.,
striping). A block/volume layout might be an array of tuples that striping). A block/volume layout might be an array of tuples that
skipping to change at page 306, line 10 skipping to change at page 312, line 50
file for which a layout is held does not necessarily conflict with file for which a layout is held does not necessarily conflict with
the holding of the layout that describes the file being modified. the holding of the layout that describes the file being modified.
Therefore, it is the requirement of the storage protocol or layout Therefore, it is the requirement of the storage protocol or layout
type that determines the necessary behavior. For example, block/ type that determines the necessary behavior. For example, block/
volume layout types require that the layout's iomode agree with the volume layout types require that the layout's iomode agree with the
type of I/O being performed. type of I/O being performed.
Depending upon the layout type and storage protocol in use, storage Depending upon the layout type and storage protocol in use, storage
device access permissions may be granted by LAYOUTGET and may be device access permissions may be granted by LAYOUTGET and may be
encoded within the type-specific layout. For an example of storage encoded within the type-specific layout. For an example of storage
device access permissions, see an object-based protocol such as [53]. device access permissions, see an object-based protocol such as [57].
If access permissions are encoded within the layout, the metadata If access permissions are encoded within the layout, the metadata
server SHOULD recall the layout when those permissions become invalid server SHOULD recall the layout when those permissions become invalid
for any reason -- for example, when a file becomes unwritable or for any reason -- for example, when a file becomes unwritable or
inaccessible to a client. Note, clients are still required to inaccessible to a client. Note, clients are still required to
perform the appropriate OPEN, LOCK, and ACCESS operations as perform the appropriate OPEN, LOCK, and ACCESS operations as
described above. The degree to which it is possible for the client described above. The degree to which it is possible for the client
to circumvent these operations and the consequences of doing so must to circumvent these operations and the consequences of doing so must
be clearly specified by the individual layout type specifications. be clearly specified by the individual layout type specifications.
In addition, these specifications must be clear about the In addition, these specifications must be clear about the
requirements and non-requirements for the checking performed by the requirements and non-requirements for the checking performed by the
skipping to change at page 323, line 32 skipping to change at page 330, line 32
If sr_status_flags from the metadata server has If sr_status_flags from the metadata server has
SEQ4_STATUS_RESTART_RECLAIM_NEEDED set (or SEQUENCE returns SEQ4_STATUS_RESTART_RECLAIM_NEEDED set (or SEQUENCE returns
NFS4ERR_BAD_SESSION and CREATE_SESSION returns NFS4ERR_BAD_SESSION and CREATE_SESSION returns
NFS4ERR_STALE_CLIENTID), then the metadata server has restarted, and NFS4ERR_STALE_CLIENTID), then the metadata server has restarted, and
the client SHOULD recover using the methods described in the client SHOULD recover using the methods described in
Section 12.7.4. Section 12.7.4.
If sr_status_flags from the metadata server has If sr_status_flags from the metadata server has
SEQ4_STATUS_LEASE_MOVED set, then the client recovers by following SEQ4_STATUS_LEASE_MOVED set, then the client recovers by following
the procedure described in Section 11.10.9.1. After that, the client the procedure described in Section 11.11.9.2. After that, the client
may get an indication that the layout state was not moved with the may get an indication that the layout state was not moved with the
file system. The client recovers as in the other applicable file system. The client recovers as in the other applicable
situations discussed in the first two paragraphs of this section. situations discussed in the first two paragraphs of this section.
If sr_status_flags reports no loss of state, then the lease for the If sr_status_flags reports no loss of state, then the lease for the
layouts that the client has are valid and renewed, and the client can layouts that the client has are valid and renewed, and the client can
once again send I/O requests to the storage devices. once again send I/O requests to the storage devices.
While clients SHOULD NOT send I/Os to storage devices that may extend While clients SHOULD NOT send I/Os to storage devices that may extend
past the lease expiration time period, this is not always possible, past the lease expiration time period, this is not always possible,
skipping to change at page 329, line 4 skipping to change at page 336, line 4
consideration in determining when it is appropriate to use such a consideration in determining when it is appropriate to use such a
pNFS configuration. Such layout types SHOULD NOT be used when pNFS configuration. Such layout types SHOULD NOT be used when
client-only access checks do not provide sufficient assurance that client-only access checks do not provide sufficient assurance that
NFSv4.1 access control is being applied correctly. (This is not a NFSv4.1 access control is being applied correctly. (This is not a
problem for the file layout type described in Section 13 because the problem for the file layout type described in Section 13 because the
storage access protocol for LAYOUT4_NFSV4_1_FILES is NFSv4.1, and storage access protocol for LAYOUT4_NFSV4_1_FILES is NFSv4.1, and
thus the security model for storage device access via thus the security model for storage device access via
LAYOUT4_NFSv4_1_FILES is the same as that of the metadata server.) LAYOUT4_NFSv4_1_FILES is the same as that of the metadata server.)
For handling of access control specific to a layout, the reader For handling of access control specific to a layout, the reader
should examine the layout specification, such as the NFSv4.1/file- should examine the layout specification, such as the NFSv4.1/file-
based layout (Section 13) of this document, the blocks layout [44], based layout (Section 13) of this document, the blocks layout [47],
and objects layout [43]. and objects layout [46].
13. NFSv4.1 as a Storage Protocol in pNFS: the File Layout Type 13. NFSv4.1 as a Storage Protocol in pNFS: the File Layout Type
This section describes the semantics and format of NFSv4.1 file-based This section describes the semantics and format of NFSv4.1 file-based
layouts for pNFS. NFSv4.1 file-based layouts use the layouts for pNFS. NFSv4.1 file-based layouts use the
LAYOUT4_NFSV4_1_FILES layout type. The LAYOUT4_NFSV4_1_FILES type LAYOUT4_NFSV4_1_FILES layout type. The LAYOUT4_NFSV4_1_FILES type
defines striping data across multiple NFSv4.1 data servers. defines striping data across multiple NFSv4.1 data servers.
13.1. Client ID and Session Considerations 13.1. Client ID and Session Considerations
skipping to change at page 354, line 42 skipping to change at page 361, line 42
The primary issue in which NFSv4.1 needs to deal with The primary issue in which NFSv4.1 needs to deal with
internationalization, or I18N, is with respect to file names and internationalization, or I18N, is with respect to file names and
other strings as used within the protocol. The choice of string other strings as used within the protocol. The choice of string
representation must allow reasonable name/string access to clients representation must allow reasonable name/string access to clients
that use various languages. The UTF-8 encoding of the UCS (Universal that use various languages. The UTF-8 encoding of the UCS (Universal
Multiple-Octet Coded Character Set) as defined by ISO10646 [18] Multiple-Octet Coded Character Set) as defined by ISO10646 [18]
allows for this type of access and follows the policy described in allows for this type of access and follows the policy described in
"IETF Policy on Character Sets and Languages", RFC 2277 [19]. "IETF Policy on Character Sets and Languages", RFC 2277 [19].
RFC 3454 [16], otherwise know as "stringprep", documents a framework RFC 3454 [16], otherwise known as "stringprep", documents a framework
for using Unicode/UTF-8 in networking protocols so as "to increase for using Unicode/UTF-8 in networking protocols so as "to increase
the likelihood that string input and string comparison work in ways the likelihood that string input and string comparison work in ways
that make sense for typical users throughout the world". A protocol that make sense for typical users throughout the world". A protocol
must define a profile of stringprep "in order to fully specify the must define a profile of stringprep "in order to fully specify the
processing options". The remainder of this section defines the processing options". The remainder of this section defines the
NFSv4.1 stringprep profiles. Much of the terminology used for the NFSv4.1 stringprep profiles. Much of the terminology used for the
remainder of this section comes from stringprep. remainder of this section comes from stringprep.
There are three UTF-8 string types defined for NFSv4.1: utf8str_cs, There are three UTF-8 string types defined for NFSv4.1: utf8str_cs,
utf8str_cis, and utf8str_mixed. Separate profiles are defined for utf8str_cis, and utf8str_mixed. Separate profiles are defined for
skipping to change at page 360, line 19 skipping to change at page 367, line 19
typedef uint32_t fs_charset_cap4; typedef uint32_t fs_charset_cap4;
Because some operating environments and file systems do not enforce Because some operating environments and file systems do not enforce
character set encodings, NFSv4.1 supports the fs_charset_cap character set encodings, NFSv4.1 supports the fs_charset_cap
attribute (Section 5.8.2.11) that indicates to the client a file attribute (Section 5.8.2.11) that indicates to the client a file
system's UTF-8 capabilities. The attribute is an integer containing system's UTF-8 capabilities. The attribute is an integer containing
a pair of flags. The first flag is FSCHARSET_CAP4_CONTAINS_NON_UTF8, a pair of flags. The first flag is FSCHARSET_CAP4_CONTAINS_NON_UTF8,
which, if set to one, tells the client that the file system contains which, if set to one, tells the client that the file system contains
non-UTF-8 characters, and the server will not convert non-UTF non-UTF-8 characters, and the server will not convert non-UTF
characters to UTF-8 if the client reads a symlink or directory, characters to UTF-8 if the client reads a symbolic link or directory,
neither will operations with component names or pathnames in the neither will operations with component names or pathnames in the
arguments convert the strings to UTF-8. The second flag is arguments convert the strings to UTF-8. The second flag is
FSCHARSET_CAP4_ALLOWS_ONLY_UTF8, which, if set to one, indicates that FSCHARSET_CAP4_ALLOWS_ONLY_UTF8, which, if set to one, indicates that
the server will accept (and generate) only UTF-8 characters on the the server will accept (and generate) only UTF-8 characters on the
file system. If FSCHARSET_CAP4_ALLOWS_ONLY_UTF8 is set to one, file system. If FSCHARSET_CAP4_ALLOWS_ONLY_UTF8 is set to one,
FSCHARSET_CAP4_CONTAINS_NON_UTF8 MUST be set to zero. FSCHARSET_CAP4_CONTAINS_NON_UTF8 MUST be set to zero.
FSCHARSET_CAP4_ALLOWS_ONLY_UTF8 SHOULD always be set to one. FSCHARSET_CAP4_ALLOWS_ONLY_UTF8 SHOULD always be set to one.
14.5. UTF-8 Related Errors 14.5. UTF-8 Related Errors
skipping to change at page 364, line 19 skipping to change at page 371, line 19
For any of a number of reasons, the replier could not process this For any of a number of reasons, the replier could not process this
operation in what was deemed a reasonable time. The client should operation in what was deemed a reasonable time. The client should
wait and then try the request with a new slot and sequence value. wait and then try the request with a new slot and sequence value.
Some examples of scenarios that might lead to this situation: Some examples of scenarios that might lead to this situation:
o A server that supports hierarchical storage receives a request to o A server that supports hierarchical storage receives a request to
process a file that had been migrated. process a file that had been migrated.
o An operation requires a delegation recall to proceed, so that the o An operation requires a delegation recall to proceed, so that the
need to wait for for this delegation to be recalled makes need to wait for this delegation to be recalled and returned makes
processing this request in a timely fashion impossible. processing this request in a timely fashion impossible.
o A request is being performed on a session being migrated from o A request is being performed on a session being migrated from
another server as described in Section 11.13.3, and the lack of another server as described in Section 11.14.3, and the lack of
full information about the state of the session on the source full information about the state of the session on the source
makes it impossible to process the request immediately. makes it impossible to process the request immediately.
In such cases, returning the error NFS4ERR_DELAY allows necessary In such cases, returning the error NFS4ERR_DELAY allows necessary
preparatory operations to proceed without holding up requester preparatory operations to proceed without holding up requester
resources such as a session slot. After delaying for period of time, resources such as a session slot. After delaying for period of time,
the client can then re-send the operation in question, often as part the client can then re-send the operation in question, often as part
of a nearly identical request. Because of the need to avoid spurious of a nearly identical request. Because of the need to avoid spurious
reissues of non-idempotent operations and to avoid acting in response reissues of non-idempotent operations and to avoid acting in response
to NFS4ERR_DELAY errors returned on responses returned from the to NFS4ERR_DELAY errors returned on responses returned from the
replier's replay cache, integration with the session-provided replay replier's replay cache, integration with the session-provided replay
cache is necessary. There are a number of cases to deal with, each cache is necessary. There are a number of cases to deal with, each
of which requires different sorts of handling by the requester and of which requires different sorts of handling by the requester and
replier: replier:
o If NFS4ERR_DELAY is returned on a SEQUENCE operation, the request o If NFS4ERR_DELAY is returned on a SEQUENCE operation, the request
is retried in full with the SEQUENCE operation containing the same is retried in full with the SEQUENCE operation containing the same
slot and sequece values. In this case, the replier MUST avoid slot and sequence values. In this case, the replier MUST avoid
returning a response containing NFS4ERR_DELAY as the response to returning a response containing NFS4ERR_DELAY as the response to
SEQUENCE solely on the basis of its presence in the replay cache. SEQUENCE solely on the basis of its presence in the replay cache.
If the replier did this, the retries would not be effective as If the replier did this, the retries would not be effective as
there would be no opportunity for the replier to see whether the there would be no opportunity for the replier to see whether the
condition that generated the NFS4ERR_DELAY had been rectified condition that generated the NFS4ERR_DELAY had been rectified
during the interim between the original request and the retry. during the interim between the original request and the retry.
o If NFS4ERR_DELAY is returned on an operation other than SEQUENCE o If NFS4ERR_DELAY is returned on an operation other than SEQUENCE
which validly appears as the first operation of a request, which validly appears as the first operation of a request,
handling is similar. The request can be retired in full without handling is similar. The request can be retried in full without
modification. In this case as well, the replier MUST avoid modification. In this case as well, the replier MUST avoid
returning a response containing NFS4ERR_DELAY as the response to returning a response containing NFS4ERR_DELAY as the response to
an intial operation of a request solely on the basis of its an initial operation of a request solely on the basis of its
presence in the replay cache. If the replier did this, the presence in the replay cache. If the replier did this, the
retries would not be effective as there would be no opportunity retries would not be effective as there would be no opportunity
for the replier to see whether the condition that generated the for the replier to see whether the condition that generated the
NFS4ERR_DELAY had been rectified during the interim between the NFS4ERR_DELAY had been rectified during the interim between the
original request and the retry. original request and the retry.
o If NFS4ERR_DELAY is returned on an operation other than the first o If NFS4ERR_DELAY is returned on an operation other than the first
in the request, the request when retried MUST contain a SEQUENCE in the request, the request when retried MUST contain a SEQUENCE
operation which is different than the original one, with either operation which is different than the original one, with either
the bin id or the sequence value different from that in the the bin id or the sequence value different from that in the
original request. Because requesters do this, there is no need original request. Because requesters do this, there is no need
for the replier to take special care to avoid returning an for the replier to take special care to avoid returning an
NFS4ERR_DELAY error, obtained from the replay cache. When no non- NFS4ERR_DELAY error, obtained from the replay cache. When no non-
idempotent operations have been processed before the NFS4ERR_DELAY idempotent operations have been processed before the NFS4ERR_DELAY
was returned, the requester should retry the request in full, with was returned, the requester should retry the request in full, with
the only difference from the original request being the the only difference from the original request being the
modfication to the slot id or sequence value in the reissued modification to the slot id or sequence value in the reissued
SEQUENCE operation. SEQUENCE operation.
o When NFS4ERR_DELAY is returned on an operation other than the o When NFS4ERR_DELAY is returned on an operation other than the
first within a request and there has been a non-idempotent first within a request and there has been a non-idempotent
operation processed before the NFS4ERR_DELAY was returned, the operation processed before the NFS4ERR_DELAY was returned,
reissued request should avoid the non-idempotent operation. The reissuing the request as is normally done would incorrectly cause
request still must use a SEQUENCE operation with either a the re-execution of the non-idempotent operation.
different slot id or sequence value from the SEQUENCE in the
original request. Because this is done, there is no way the To avoid this situation, the client should reissue the request
replier could avoid spuriously re-executing the non-idempotent without the non-idempotent operation. The request still must use
operation since the different SEQUENCE parameters prevent the a SEQUENCE operation with either a different slot id or sequence
requester from recognizing that the non-idempotent operation is value from the SEQUENCE in the original request. Because this is
being retried. done, there is no way the replier could avoid spuriously re-
executing the non-idempotent operation since the different
SEQUENCE parameters prevent the requester from recognizing that
the non-idempotent operation is being retried.
Note that without the ability to return NFS4ERR_DELAY and the Note that without the ability to return NFS4ERR_DELAY and the
requester's willingness to re-send when receiving it, deadlock might requester's willingness to re-send when receiving it, deadlock might
result. For example, if a recall is done, and if the delegation result. For example, if a recall is done, and if the delegation
return or operations preparatory to delegation return are held up by return or operations preparatory to delegation return are held up by
other operations that need the delegation to be returned, session other operations that need the delegation to be returned, session
slots might not be available. The result could be deadlock. slots might not be available. The result could be deadlock.
15.1.1.4. NFS4ERR_INVAL (Error Code 22) 15.1.1.4. NFS4ERR_INVAL (Error Code 22)
skipping to change at page 367, line 14 skipping to change at page 374, line 19
15.1.2.3. NFS4ERR_ISDIR (Error Code 21) 15.1.2.3. NFS4ERR_ISDIR (Error Code 21)
The current or saved filehandle designates a directory when the The current or saved filehandle designates a directory when the
current operation does not allow a directory to be accepted as the current operation does not allow a directory to be accepted as the
target of this operation. target of this operation.
15.1.2.4. NFS4ERR_MOVED (Error Code 10019) 15.1.2.4. NFS4ERR_MOVED (Error Code 10019)
The file system that contains the current filehandle object is not The file system that contains the current filehandle object is not
present at the server. It may have been relocated or migrated to present at the server, or is not accessible using the network address
another server, or it may have never been present. The client may used. It may have been made accessible on a different set of network
obtain the new file system location by obtaining the "fs_locations" addresses, relocated or migrated to another server, or it may have
or "fs_locations_info" attribute for the current filehandle. For never been present. The client may obtain the new file system
further discussion, refer to Section 11.3. location by obtaining the "fs_locations" or "fs_locations_info"
attribute for the current filehandle. For further discussion, refer
to Section 11.3.
As with the case of NFS4ERR_DELAY, it is possible that one or more
non-idempotent operations may have been successfully executed within
a COMPOUND before NFS4ERR_MOVED is returned. Because of this, once
the new location is determined, the original request which received
the NFS4ERR_MOVED should not be re-executed in full. Instead, the
client should send a new COMPOUND, with any successfully executed
non-idempotent operations removed. When the client uses the same
session for the new COMPOUND, its SEQUENCE operation should use a
different slot id or sequence.
15.1.2.5. NFS4ERR_NOFILEHANDLE (Error Code 10020) 15.1.2.5. NFS4ERR_NOFILEHANDLE (Error Code 10020)
The logical current or saved filehandle value is required by the The logical current or saved filehandle value is required by the
current operation and is not set. This may be a result of a current operation and is not set. This may be a result of a
malformed COMPOUND operation (i.e., no PUTFH or PUTROOTFH before an malformed COMPOUND operation (i.e., no PUTFH or PUTROOTFH before an
operation that requires the current filehandle be set). operation that requires the current filehandle be set).
15.1.2.6. NFS4ERR_NOTDIR (Error Code 20) 15.1.2.6. NFS4ERR_NOTDIR (Error Code 20)
skipping to change at page 375, line 17 skipping to change at page 382, line 39
An attempt to OPEN a file with a share reservation has failed because An attempt to OPEN a file with a share reservation has failed because
of a share conflict. of a share conflict.
15.1.9. Reclaim Errors 15.1.9. Reclaim Errors
These errors relate to the process of reclaiming locks after a server These errors relate to the process of reclaiming locks after a server
restart. restart.
15.1.9.1. NFS4ERR_COMPLETE_ALREADY (Error Code 10054) 15.1.9.1. NFS4ERR_COMPLETE_ALREADY (Error Code 10054)
The client previously sent a successful RECLAIM_COMPLETE operation. The client previously sent a successful RECLAIM_COMPLETE operation
An additional RECLAIM_COMPLETE operation is not necessary and results specifying the same scope, whether that scope is global or for the
in this error. same file system in the case of a per-fs RECLAIM_COMPLETE. An
additional RECLAIM_COMPLETE operation is not necessary and results in
this error.
15.1.9.2. NFS4ERR_GRACE (Error Code 10013) 15.1.9.2. NFS4ERR_GRACE (Error Code 10013)
The server was in its recovery or grace period. The locking request This error is returned when the server is in its grace period with
was not a reclaim request and so could not be granted during that regard to the file system object for which the lock was requested.
period. In this situation, a non-reclaim locking request cannot be granted.
This can occur because either
o The server does not have sufficient information about locks that
might be potentially reclaimed to determine whether the lock could
be granted.
o The request is made by a client responsible for reclaiming its
locks that has not yet done the appropriate RECLAIM_COMPLETE
operation, allowing it to proceed to obtain new locks.
In the case of a per-fs grace period, there may be clients, (i.e.,
those currently using the destination file system) who might be
unaware of the circumstances resulting in the initiation of the grace
period. Such clients need to periodically retry the request until
the grace period is over, just as other clients do.
15.1.9.3. NFS4ERR_NO_GRACE (Error Code 10033) 15.1.9.3. NFS4ERR_NO_GRACE (Error Code 10033)
A reclaim of client state was attempted in circumstances in which the A reclaim of client state was attempted in circumstances in which the
server cannot guarantee that conflicting state has not been provided server cannot guarantee that conflicting state has not been provided
to another client. This can occur because the reclaim has been done to another client. This occurs in any of the following situations.
outside of the grace period of the server, after the client has done
a RECLAIM_COMPLETE operation, or because previous operations have o There is no active grace period applying to the file system object
created a situation in which the server is not able to determine that for which the request was made.
a reclaim-interfering edge condition does not exist.
o The client making the request has no current role in reclaiming
locks.
o Previous operations have created a situation in which the server
is not able to determine that a reclaim-interfering edge condition
does not exist.
15.1.9.4. NFS4ERR_RECLAIM_BAD (Error Code 10034) 15.1.9.4. NFS4ERR_RECLAIM_BAD (Error Code 10034)
The server has determined that a reclaim attempted by the client is The server has determined that a reclaim attempted by the client is
not valid, i.e. the lock specified as being reclaimed could not not valid, i.e. the lock specified as being reclaimed could not
possibly have existed before the server restart. A server is not possibly have existed before the server restart or file system
obliged to make this determination and will typically rely on the migration event. A server is not obliged to make this determination
client to only reclaim locks that the client was granted prior to and will typically rely on the client to only reclaim locks that the
restart. However, when a server does have reliable information to client was granted prior to restart. However, when a server does
enable it make this determination, this error indicates that the have reliable information to enable it to make this determination,
reclaim has been rejected as invalid. This is as opposed to the this error indicates that the reclaim has been rejected as invalid.
error NFS4ERR_RECLAIM_CONFLICT (see Section 15.1.9.5) where the This is as opposed to the error NFS4ERR_RECLAIM_CONFLICT (see
server can only determine that there has been an invalid reclaim, but Section 15.1.9.5) where the server can only determine that there has
cannot determine which request is invalid. been an invalid reclaim, but cannot determine which request is
invalid.
15.1.9.5. NFS4ERR_RECLAIM_CONFLICT (Error Code 10035) 15.1.9.5. NFS4ERR_RECLAIM_CONFLICT (Error Code 10035)
The reclaim attempted by the client has encountered a conflict and The reclaim attempted by the client has encountered a conflict and
cannot be satisfied. Potentially indicates a misbehaving client, cannot be satisfied. This potentially indicates a misbehaving
although not necessarily the one receiving the error. The client, although not necessarily the one receiving the error. The
misbehavior might be on the part of the client that established the misbehavior might be on the part of the client that established the
lock with which this client conflicted. See also Section 15.1.9.4 lock with which this client conflicted. See also Section 15.1.9.4
for the related error, NFS4ERR_RECLAIM_BAD. for the related error, NFS4ERR_RECLAIM_BAD.
15.1.10. pNFS Errors 15.1.10. pNFS Errors
This section deals with pNFS-related errors including those that are This section deals with pNFS-related errors including those that are
associated with using NFSv4.1 to communicate with a data server. associated with using NFSv4.1 to communicate with a data server.
15.1.10.1. NFS4ERR_BADIOMODE (Error Code 10049) 15.1.10.1. NFS4ERR_BADIOMODE (Error Code 10049)
skipping to change at page 432, line 50 skipping to change at page 440, line 50
o When a client executes a regular file, it has to read the file o When a client executes a regular file, it has to read the file
from the server. Strictly speaking, the server should not allow from the server. Strictly speaking, the server should not allow
the client to read a file being executed unless the user has read the client to read a file being executed unless the user has read
permissions on the file. Requiring explicit read permissions on permissions on the file. Requiring explicit read permissions on
executable files in order to access them over NFS is not going to executable files in order to access them over NFS is not going to
be acceptable to some users and storage administrators. be acceptable to some users and storage administrators.
Historically, NFS servers have allowed a user to READ a file if Historically, NFS servers have allowed a user to READ a file if
the user has execute access to the file. the user has execute access to the file.
As a practical example, the UNIX specification [55] states that an As a practical example, the UNIX specification [59] states that an
implementation claiming conformance to UNIX may indicate in the implementation claiming conformance to UNIX may indicate in the
access() programming interface's result that a privileged user has access() programming interface's result that a privileged user has
execute rights, even if no execute permission bits are set on the execute rights, even if no execute permission bits are set on the
regular file's attributes. It is possible to claim conformance to regular file's attributes. It is possible to claim conformance to
the UNIX specification and instead not indicate execute rights in the UNIX specification and instead not indicate execute rights in
that situation, which is true for some operating environments. that situation, which is true for some operating environments.
Suppose the operating environments of the client and server are Suppose the operating environments of the client and server are
implementing the access() semantics for privileged users differently, implementing the access() semantics for privileged users differently,
and the ACCESS operation implementations of the client and server and the ACCESS operation implementations of the client and server
follow their respective access() semantics. This can cause undesired follow their respective access() semantics. This can cause undesired
skipping to change at page 487, line 15 skipping to change at page 495, line 15
18.20.3. DESCRIPTION 18.20.3. DESCRIPTION
This operation replaces the current filehandle with the filehandle This operation replaces the current filehandle with the filehandle
that represents the public filehandle of the server's namespace. that represents the public filehandle of the server's namespace.
This filehandle may be different from the "root" filehandle that may This filehandle may be different from the "root" filehandle that may
be associated with some other directory on the server. be associated with some other directory on the server.
PUTPUBFH also clears the current stateid. PUTPUBFH also clears the current stateid.
The public filehandle represents the concepts embodied in RFC 2054 The public filehandle represents the concepts embodied in RFC 2054
[45], RFC 2055 [46], and RFC 2224 [56]. The intent for NFSv4.1 is [48], RFC 2055 [49], and RFC 2224 [60]. The intent for NFSv4.1 is
that the public filehandle (represented by the PUTPUBFH operation) be that the public filehandle (represented by the PUTPUBFH operation) be
used as a method of providing WebNFS server compatibility with NFSv3. used as a method of providing WebNFS server compatibility with NFSv3.
The public filehandle and the root filehandle (represented by the The public filehandle and the root filehandle (represented by the
PUTROOTFH operation) SHOULD be equivalent. If the public and root PUTROOTFH operation) SHOULD be equivalent. If the public and root
filehandles are not equivalent, then the directory corresponding to filehandles are not equivalent, then the directory corresponding to
the public filehandle MUST be a descendant of the directory the public filehandle MUST be a descendant of the directory
corresponding to the root filehandle. corresponding to the root filehandle.
See Section 16.2.3.1.1 for more details on the current filehandle. See Section 16.2.3.1.1 for more details on the current filehandle.
skipping to change at page 487, line 37 skipping to change at page 495, line 37
See Section 16.2.3.1.2 for more details on the current stateid. See Section 16.2.3.1.2 for more details on the current stateid.
18.20.4. IMPLEMENTATION 18.20.4. IMPLEMENTATION
This operation is used in an NFS request to set the context for file This operation is used in an NFS request to set the context for file
accessing operations that follow in the same COMPOUND request. accessing operations that follow in the same COMPOUND request.
With the NFSv3 public filehandle, the client is able to specify With the NFSv3 public filehandle, the client is able to specify
whether the pathname provided in the LOOKUP should be evaluated as whether the pathname provided in the LOOKUP should be evaluated as
either an absolute path relative to the server's root or relative to either an absolute path relative to the server's root or relative to
the public filehandle. RFC 2224 [56] contains further discussion of the public filehandle. RFC 2224 [60] contains further discussion of
the functionality. With NFSv4.1, that type of specification is not the functionality. With NFSv4.1, that type of specification is not
directly available in the LOOKUP operation. The reason for this is directly available in the LOOKUP operation. The reason for this is
because the component separators needed to specify absolute vs. because the component separators needed to specify absolute vs.
relative are not allowed in NFSv4. Therefore, the client is relative are not allowed in NFSv4. Therefore, the client is
responsible for constructing its request such that the use of either responsible for constructing its request such that the use of either
PUTROOTFH or PUTPUBFH signifies absolute or relative evaluation of an PUTROOTFH or PUTPUBFH signifies absolute or relative evaluation of an
NFS URL, respectively. NFS URL, respectively.
Note that there are warnings mentioned in RFC 2224 [56] with respect Note that there are warnings mentioned in RFC 2224 [60] with respect
to the use of absolute evaluation and the restrictions the server may to the use of absolute evaluation and the restrictions the server may
place on that evaluation with respect to how much of its namespace place on that evaluation with respect to how much of its namespace
has been made available. These same warnings apply to NFSv4.1. It has been made available. These same warnings apply to NFSv4.1. It
is likely, therefore, that because of server implementation details, is likely, therefore, that because of server implementation details,
an NFSv3 absolute public filehandle look up may behave differently an NFSv3 absolute public filehandle look up may behave differently
than an NFSv4.1 absolute resolution. than an NFSv4.1 absolute resolution.
There is a form of security negotiation as described in RFC 2755 [57] There is a form of security negotiation as described in RFC 2755 [61]
that uses the public filehandle and an overloading of the pathname. that uses the public filehandle and an overloading of the pathname.
This method is not available with NFSv4.1 as filehandles are not This method is not available with NFSv4.1 as filehandles are not
overloaded with special meaning and therefore do not provide the same overloaded with special meaning and therefore do not provide the same
framework as NFSv3. Clients should therefore use the security framework as NFSv3. Clients should therefore use the security
negotiation mechanisms described in Section 2.6. negotiation mechanisms described in Section 2.6.
18.21. Operation 24: PUTROOTFH - Set Root Filehandle 18.21. Operation 24: PUTROOTFH - Set Root Filehandle
18.21.1. ARGUMENTS 18.21.1. ARGUMENTS
skipping to change at page 524, line 33 skipping to change at page 532, line 33
and the client has no need for this value and will ignore it and the client has no need for this value and will ignore it
EXCHANGE_ID MAY be sent in a COMPOUND procedure that starts with EXCHANGE_ID MAY be sent in a COMPOUND procedure that starts with
SEQUENCE. However, when a client communicates with a server for the SEQUENCE. However, when a client communicates with a server for the
first time, it will not have a session, so using SEQUENCE will not be first time, it will not have a session, so using SEQUENCE will not be
possible. If EXCHANGE_ID is sent without a preceding SEQUENCE, then possible. If EXCHANGE_ID is sent without a preceding SEQUENCE, then
it MUST be the only operation in the COMPOUND procedure's request. it MUST be the only operation in the COMPOUND procedure's request.
If it is not, the server MUST return NFS4ERR_NOT_ONLY_OP. If it is not, the server MUST return NFS4ERR_NOT_ONLY_OP.
The eia_clientowner field is composed of a co_verifier field and a The eia_clientowner field is composed of a co_verifier field and a
co_ownerid string. As noted in s Section 2.4, the co_ownerid co_ownerid string. As noted in Section 2.4, the co_ownerid
describes the client, and the co_verifier is the incarnation of the identifies the client, and the co_verifier specifies a particular
client. An EXCHANGE_ID sent with a new incarnation of the client incarnation of that client. An EXCHANGE_ID sent with a new
will lead to the server removing lock state of the old incarnation. incarnation of the client will lead to the server removing lock state
Whereas an EXCHANGE_ID sent with the current incarnation and of the old incarnation. On the other hand, an EXCHANGE_ID sent with
co_ownerid will result in an error or an update of the client ID's the current incarnation and co_ownerid will, when it does not result
properties, depending on the arguments to EXCHANGE_ID. in an unrelated error, potentially update an existing client ID's
properties, or simply return information about the existing
client_id. That latter would happen when this operation is done to
the same server using different network addresses as part of creating
trunked connections.
A server MUST NOT provide the same client ID to two different A server MUST NOT provide the same client ID to two different
incarnations of an eia_clientowner. incarnations of an eia_clientowner.
In addition to the client ID and sequence ID, the server returns a In addition to the client ID and sequence ID, the server returns a
server owner (eir_server_owner) and server scope (eir_server_scope). server owner (eir_server_owner) and server scope (eir_server_scope).
The former field is used in connection with network trunking as The former field is used in connection with network trunking as
described in Section 2.10.5. The latter field is used to allow described in Section 2.10.5. The latter field is used to allow
clients to determine when client IDs sent by one server may be clients to determine when client IDs sent by one server may be
recognized by another in the event of file system migration (see recognized by another in the event of file system migration (see
Section 11.10.9 of the current document). Section 11.11.9 of the current document).
The client ID returned by EXCHANGE_ID is only unique relative to the The client ID returned by EXCHANGE_ID is only unique relative to the
combination of eir_server_owner.so_major_id and eir_server_scope. combination of eir_server_owner.so_major_id and eir_server_scope.
Thus, if two servers return the same client ID, the onus is on the Thus, if two servers return the same client ID, the onus is on the
client to distinguish the client IDs on the basis of client to distinguish the client IDs on the basis of
eir_server_owner.so_major_id and eir_server_scope. In the event two eir_server_owner.so_major_id and eir_server_scope. In the event two
different servers claim matching server_owner.so_major_id and different servers claim matching server_owner.so_major_id and
eir_server_scope, the client can use the verification techniques eir_server_scope, the client can use the verification techniques
discussed in Section 2.10.5.1 to determine if the servers are discussed in Section 2.10.5.1 to determine if the servers are
distinct. If they are distinct, then the client will need to note distinct. If they are distinct, then the client will need to note
skipping to change at page 526, line 47 skipping to change at page 534, line 51
derived from the SSV, and the derivation is via the hash derived from the SSV, and the derivation is via the hash
algorithm. The selection of an encryption algorithm with a algorithm. The selection of an encryption algorithm with a
key length that exceeded the length of the output of the key length that exceeded the length of the output of the
hash algorithm would require padding, and thus weaken the hash algorithm would require padding, and thus weaken the
use of the encryption algorithm. use of the encryption algorithm.
+ hash length SHOULD be <= SSV length. This is because the + hash length SHOULD be <= SSV length. This is because the
SSV is a key used to derive subkeys via an HMAC, and it is SSV is a key used to derive subkeys via an HMAC, and it is
recommended that the key used as input to an HMAC be at recommended that the key used as input to an HMAC be at
least as long as the length of the HMAC's hash algorithm's least as long as the length of the HMAC's hash algorithm's
output (see Section 3 of [59]). output (see Section 3 of [51]).
+ key length SHOULD be <= SSV length. This is a transitive + key length SHOULD be <= SSV length. This is a transitive
result of the above two invariants. result of the above two invariants.
+ key length SHOULD be >= hash length / 2. This is because + key length SHOULD be >= hash length / 2. This is because
the subkey derivation is via an HMAC and it is recommended the subkey derivation is via an HMAC and it is recommended
that if the HMAC has to be truncated, it should not be that if the HMAC has to be truncated, it should not be
truncated to less than half the hash length (see Section 4 truncated to less than half the hash length (see Section 4
of RFC2104 [59]). of RFC2104 [51]).
* Number of concurrent versions of the SSV the client and server * Number of concurrent versions of the SSV the client and server
will support (see Section 2.10.9). This property is will support (see Section 2.10.9). This property is
represented by spi_window in the EXCHANGE_ID results. The represented by spi_window in the EXCHANGE_ID results. The
property may be updated by subsequent EXCHANGE_ID operations. property may be updated by subsequent EXCHANGE_ID operations.
o The client's implementation ID as represented by the o The client's implementation ID as represented by the
eia_client_impl_id field of the arguments. The property may be eia_client_impl_id field of the arguments. The property may be
updated by subsequent EXCHANGE_ID requests. updated by subsequent EXCHANGE_ID requests.
skipping to change at page 531, line 31 skipping to change at page 539, line 33
and the server agrees. and the server agrees.
The SP4_SSV protection parameters also have: The SP4_SSV protection parameters also have:
ssp_hash_algs: ssp_hash_algs:
This is the set of algorithms the client supports for the purpose This is the set of algorithms the client supports for the purpose
of computing the digests needed for the internal SSV GSS mechanism of computing the digests needed for the internal SSV GSS mechanism
and for the SET_SSV operation. Each algorithm is specified as an and for the SET_SSV operation. Each algorithm is specified as an
object identifier (OID). The REQUIRED algorithms for a server are object identifier (OID). The REQUIRED algorithms for a server are
id-sha1, id-sha224, id-sha256, id-sha384, and id-sha512 [25]. The id-sha1, id-sha224, id-sha256, id-sha384, and id-sha512 [25].
algorithm the server selects among the set is indicated in
Due to known weaknesses in id-sha1, it is RECOMMENDED that the
client specify at least one algorithm within ssp_hash_algs other
than id-sha1.
The algorithm the server selects among the set is indicated in
spi_hash_alg, a field of spr_ssv_prot_info. The field spi_hash_alg, a field of spr_ssv_prot_info. The field
spi_hash_alg is an index into the array ssp_hash_algs. If the spi_hash_alg is an index into the array ssp_hash_algs. Because of
server does not support any of the offered algorithms, it returns known the weaknesses in id-sha1, it is RECOMMENDED that it not be
NFS4ERR_HASH_ALG_UNSUPP. If ssp_hash_algs is empty, the server selected by the server as long as ssp_hash_algs contains any other
MUST return NFS4ERR_INVAL. supported algorithm.
If the server does not support any of the offered algorithms, it
returns NFS4ERR_HASH_ALG_UNSUPP. If ssp_hash_algs is empty, the
server MUST return NFS4ERR_INVAL.
ssp_encr_algs: ssp_encr_algs:
This is the set of algorithms the client supports for the purpose This is the set of algorithms the client supports for the purpose
of providing privacy protection for the internal SSV GSS of providing privacy protection for the internal SSV GSS
mechanism. Each algorithm is specified as an OID. The REQUIRED mechanism. Each algorithm is specified as an OID. The REQUIRED
algorithm for a server is id-aes256-CBC. The RECOMMENDED algorithm for a server is id-aes256-CBC. The RECOMMENDED
algorithms are id-aes192-CBC and id-aes128-CBC [26]. The selected algorithms are id-aes192-CBC and id-aes128-CBC [26]. The selected
algorithm is returned in spi_encr_alg, an index into algorithm is returned in spi_encr_alg, an index into
ssp_encr_algs. If the server does not support any of the offered ssp_encr_algs. If the server does not support any of the offered
skipping to change at page 542, line 46 skipping to change at page 551, line 17
If CREATE_SESSION4_FLAG_CONN_RDMA is set in csa_flags, and if If CREATE_SESSION4_FLAG_CONN_RDMA is set in csa_flags, and if
the connection over which the CREATE_SESSION operation arrived the connection over which the CREATE_SESSION operation arrived
is currently in non-RDMA mode but has the capability to operate is currently in non-RDMA mode but has the capability to operate
in RDMA mode, then the client is requesting that the server in RDMA mode, then the client is requesting that the server
"step up" to RDMA mode on the connection. If the server "step up" to RDMA mode on the connection. If the server
agrees, it sets CREATE_SESSION4_FLAG_CONN_RDMA in the result agrees, it sets CREATE_SESSION4_FLAG_CONN_RDMA in the result
field csr_flags. If CREATE_SESSION4_FLAG_CONN_RDMA is not set field csr_flags. If CREATE_SESSION4_FLAG_CONN_RDMA is not set
in csa_flags, then CREATE_SESSION4_FLAG_CONN_RDMA MUST NOT be in csa_flags, then CREATE_SESSION4_FLAG_CONN_RDMA MUST NOT be
set in csr_flags. Note that once the server agrees to step up, set in csr_flags. Note that once the server agrees to step up,
it and the client MUST exchange all future traffic on the it and the client MUST exchange all future traffic on the
connection with RPC RDMA framing and not Record Marking ([31]). connection with RPC RDMA framing and not Record Marking ([32]).
csa_fore_chan_attrs, csa_fore_chan_attrs: csa_fore_chan_attrs, csa_fore_chan_attrs:
The csa_fore_chan_attrs and csa_back_chan_attrs fields apply to The csa_fore_chan_attrs and csa_back_chan_attrs fields apply to
attributes of the fore channel (which conveys requests originating attributes of the fore channel (which conveys requests originating
from the client to the server), and the backchannel (the channel from the client to the server), and the backchannel (the channel
that conveys callback requests originating from the server to the that conveys callback requests originating from the server to the
client), respectively. The results are in corresponding client), respectively. The results are in corresponding
structures called csr_fore_chan_attrs and csr_back_chan_attrs. structures called csr_fore_chan_attrs and csr_back_chan_attrs.
The results establish attributes for each channel, and on all The results establish attributes for each channel, and on all
skipping to change at page 584, line 43 skipping to change at page 593, line 43
remains set on all SEQUENCE replies until the loss of all such remains set on all SEQUENCE replies until the loss of all such
locks has been acknowledged by use of FREE_STATEID. locks has been acknowledged by use of FREE_STATEID.
SEQ4_STATUS_LEASE_MOVED SEQ4_STATUS_LEASE_MOVED
When set, indicates that responsibility for lease renewal has been When set, indicates that responsibility for lease renewal has been
transferred to one or more new servers. This condition will transferred to one or more new servers. This condition will
continue until the client receives an NFS4ERR_MOVED error and the continue until the client receives an NFS4ERR_MOVED error and the
server receives the subsequent GETATTR for the fs_locations or server receives the subsequent GETATTR for the fs_locations or
fs_locations_info attribute for an access to each file system for fs_locations_info attribute for an access to each file system for
which a lease has been moved to a new server. See which a lease has been moved to a new server. See
Section 11.10.9.1. Section 11.11.9.2.
SEQ4_STATUS_RESTART_RECLAIM_NEEDED SEQ4_STATUS_RESTART_RECLAIM_NEEDED
When set, indicates that due to server restart, the client must When set, indicates that due to server restart, the client must
reclaim locking state. Until the client sends a global reclaim locking state. Until the client sends a global
RECLAIM_COMPLETE (Section 18.51), every SEQUENCE operation will RECLAIM_COMPLETE (Section 18.51), every SEQUENCE operation will
return SEQ4_STATUS_RESTART_RECLAIM_NEEDED. return SEQ4_STATUS_RESTART_RECLAIM_NEEDED.
SEQ4_STATUS_BACKCHANNEL_FAULT SEQ4_STATUS_BACKCHANNEL_FAULT
The server has encountered an unrecoverable fault with the The server has encountered an unrecoverable fault with the
backchannel (e.g., it has lost track of the sequence ID for a slot backchannel (e.g., it has lost track of the sequence ID for a slot
skipping to change at page 587, line 31 skipping to change at page 596, line 31
This operation is used to update the SSV for a client ID. Before This operation is used to update the SSV for a client ID. Before
SET_SSV is called the first time on a client ID, the SSV is zero. SET_SSV is called the first time on a client ID, the SSV is zero.
The SSV is the key used for the SSV GSS mechanism (Section 2.10.9) The SSV is the key used for the SSV GSS mechanism (Section 2.10.9)
SET_SSV MUST be preceded by a SEQUENCE operation in the same SET_SSV MUST be preceded by a SEQUENCE operation in the same
COMPOUND. It MUST NOT be used if the client did not opt for SP4_SSV COMPOUND. It MUST NOT be used if the client did not opt for SP4_SSV
state protection when the client ID was created (see Section 18.35); state protection when the client ID was created (see Section 18.35);
the server returns NFS4ERR_INVAL in that case. the server returns NFS4ERR_INVAL in that case.
The field ssa_digest is computed as the output of the HMAC (RFC 2104 The field ssa_digest is computed as the output of the HMAC (RFC 2104
[59]) using the subkey derived from the SSV4_SUBKEY_MIC_I2T and [51]) using the subkey derived from the SSV4_SUBKEY_MIC_I2T and
current SSV as the key (see Section 2.10.9 for a description of current SSV as the key (see Section 2.10.9 for a description of
subkeys), and an XDR encoded value of data type ssa_digest_input4. subkeys), and an XDR encoded value of data type ssa_digest_input4.
The field sdi_seqargs is equal to the arguments of the SEQUENCE The field sdi_seqargs is equal to the arguments of the SEQUENCE
operation for the COMPOUND procedure that SET_SSV is within. operation for the COMPOUND procedure that SET_SSV is within.
The argument ssa_ssv is XORed with the current SSV to produce the new The argument ssa_ssv is XORed with the current SSV to produce the new
SSV. The argument ssa_ssv SHOULD be generated randomly. SSV. The argument ssa_ssv SHOULD be generated randomly.
In the response, ssr_digest is the output of the HMAC using the In the response, ssr_digest is the output of the HMAC using the
subkey derived from SSV4_SUBKEY_MIC_T2I and new SSV as the key, and subkey derived from SSV4_SUBKEY_MIC_T2I and new SSV as the key, and
skipping to change at page 596, line 47 skipping to change at page 605, line 47
These two may be done in any order as long as all necessary lock These two may be done in any order as long as all necessary lock
reclaims have been done before issuing either of them. reclaims have been done before issuing either of them.
Any locks not reclaimed at the point at which RECLAIM_COMPLETE is Any locks not reclaimed at the point at which RECLAIM_COMPLETE is
done become non-reclaimable. The client MUST NOT attempt to reclaim done become non-reclaimable. The client MUST NOT attempt to reclaim
them, either during the current server instance or in any subsequent them, either during the current server instance or in any subsequent
server instance, or on another server to which responsibility for server instance, or on another server to which responsibility for
that file system is transferred. If the client were to do so, it that file system is transferred. If the client were to do so, it
would be violating the protocol by representing itself as owning would be violating the protocol by representing itself as owning
locks that it does not own, and so has no right to reclaim. See locks that it does not own, and so has no right to reclaim. See
Section 8.4.3 of [62] for a discussion of edge conditions related to Section 8.4.3 of [65] for a discussion of edge conditions related to
lock reclaim. lock reclaim.
By sending a RECLAIM_COMPLETE, the client indicates readiness to By sending a RECLAIM_COMPLETE, the client indicates readiness to
proceed to do normal non-reclaim locking operations. The client proceed to do normal non-reclaim locking operations. The client
should be aware that such operations may temporarily result in should be aware that such operations may temporarily result in
NFS4ERR_GRACE errors until the server is ready to terminate its grace NFS4ERR_GRACE errors until the server is ready to terminate its grace
period. period.
18.51.4. IMPLEMENTATION 18.51.4. IMPLEMENTATION
skipping to change at page 597, line 39 skipping to change at page 606, line 39
When a RECLAIM_COMPLETE is sent, the client effectively acknowledges When a RECLAIM_COMPLETE is sent, the client effectively acknowledges
any locks not yet reclaimed as lost. This allows the server to re- any locks not yet reclaimed as lost. This allows the server to re-
enable the client to recover locks if the occurrence of edge enable the client to recover locks if the occurrence of edge
conditions, as described in Section 8.4.3, had caused the server to conditions, as described in Section 8.4.3, had caused the server to
disable the client's ability to recover locks. disable the client's ability to recover locks.
Because previous descriptions of RECLAIM_COMPLETE were not Because previous descriptions of RECLAIM_COMPLETE were not
sufficiently explicit about the circumstances in which use of sufficiently explicit about the circumstances in which use of
RECLAIM_COMPLETE with rca_one_fs set to TRUE was appropriate, there RECLAIM_COMPLETE with rca_one_fs set to TRUE was appropriate, there
have been cases which it has been misused by clients, and cases in have been cases which it has been misused by clients who have issued
which servers have, in various ways, not responded to such misuse as RECLAIM_COMPLETE with rca_one_fs set to TRUE when it should have not
described above. While clients SHOULD NOT misuse this feature and been. There have also been cases in which servers have, in various
servers SHOULD respond to such misuse as described above, ways, not responded to such misuse as described above, either
implementers need to be aware of the following considerations as they ignoring the rca_one_fs setting (treating the operation as a global
make necessary tradeoffs between interoperability with existing RECLAIM_COMPLETE) or ignoring the entire operation.
implementations and proper support for facilities to allow lock
recovery in the event of file system migration. While clients SHOULD NOT misuse this feature and servers SHOULD
respond to such misuse as described above, implementers need to be
aware of the following considerations as they make necessary
tradeoffs between interoperability with existing implementations and
proper support for facilities to allow lock recovery in the event of
file system migration.
o When servers have no support for becoming the destination server o When servers have no support for becoming the destination server
of a file system subject to migration, there is no possibility of of a file system subject to migration, there is no possibility of
a per-fs RECLAIM_COMPLETE being done legitimately and occurrences a per-fs RECLAIM_COMPLETE being done legitimately and occurrences
of it SHOULD be ignored. However, the negative consequences of of it SHOULD be ignored. However, the negative consequences of
accepting such mistaken use are quite limited as long as the accepting such mistaken use are quite limited as long as the
client does not issue it before all necessary reclaims are done. client does not issue it before all necessary reclaims are done.
o When a server might become the destination for a file system being o When a server might become the destination for a file system being
migrated, inappropriate use of per-fs RECLAIM_COMPLETE is more migrated, inappropriate use of per-fs RECLAIM_COMPLETE is more
skipping to change at page 616, line 17 skipping to change at page 625, line 17
RCA4_TYPE_MASK_DIR_DLG RCA4_TYPE_MASK_DIR_DLG
The client is to return directory delegations. The client is to return directory delegations.
RCA4_TYPE_MASK_FILE_LAYOUT RCA4_TYPE_MASK_FILE_LAYOUT
The client is to return layouts of type LAYOUT4_NFSV4_1_FILES. The client is to return layouts of type LAYOUT4_NFSV4_1_FILES.
RCA4_TYPE_MASK_BLK_LAYOUT RCA4_TYPE_MASK_BLK_LAYOUT
See [44] for a description. See [47] for a description.
RCA4_TYPE_MASK_OBJ_LAYOUT_MIN to RCA4_TYPE_MASK_OBJ_LAYOUT_MAX RCA4_TYPE_MASK_OBJ_LAYOUT_MIN to RCA4_TYPE_MASK_OBJ_LAYOUT_MAX
See [43] for a description. See [46] for a description.
RCA4_TYPE_MASK_OTHER_LAYOUT_MIN to RCA4_TYPE_MASK_OTHER_LAYOUT_MAX RCA4_TYPE_MASK_OTHER_LAYOUT_MIN to RCA4_TYPE_MASK_OTHER_LAYOUT_MAX
This range is reserved for telling the client to recall layouts of This range is reserved for telling the client to recall layouts of
experimental or site-specific layout types (see Section 3.3.13). experimental or site-specific layout types (see Section 3.3.13).
When a bit is set in the type mask that corresponds to an undefined When a bit is set in the type mask that corresponds to an undefined
type of recallable object, NFS4ERR_INVAL MUST be returned. When a type of recallable object, NFS4ERR_INVAL MUST be returned. When a
bit is set that corresponds to a defined type of object but the bit is set that corresponds to a defined type of object but the
client does not support an object of the type, NFS4ERR_INVAL MUST NOT client does not support an object of the type, NFS4ERR_INVAL MUST NOT
skipping to change at page 629, line 7 skipping to change at page 638, line 7
the attacker. With integrity protection, this attack is the attacker. With integrity protection, this attack is
mitigated. mitigated.
Relative to previous NFS versions, NFSv4.1 has additional security Relative to previous NFS versions, NFSv4.1 has additional security
considerations for pNFS (see Sections 12.9 and 13.12), locking and considerations for pNFS (see Sections 12.9 and 13.12), locking and
session state (see Section 2.10.8.3), and state recovery during grace session state (see Section 2.10.8.3), and state recovery during grace
period (see Section 8.4.2.1.1). With respect to locking and session period (see Section 8.4.2.1.1). With respect to locking and session
state, if SP4_SSV state protection is being used, Section 2.10.10 has state, if SP4_SSV state protection is being used, Section 2.10.10 has
specific security considerations for the NFSv4.1 client and server. specific security considerations for the NFSv4.1 client and server.
The use of the multi-server bamespace features described in Security considerations for lock reclaim differ between the two
different situations in which state reclaim is to be done. The
server failure situation is discussed in Section 8.4.2.1.1 while the
per-fs state reclaim done in support of migration/replication is
discussed in Section 11.11.9.1.
The use of the multi-server namespace features described in
Section 11 raises the possibility that requests to determine the set Section 11 raises the possibility that requests to determine the set
of network addresses corresponding to a given server might be of network addresses corresponding to a given server might be
interfered with or have their responses modified in flight. In light interfered with or have their responses modified in flight. In light
of this possibility, the following considerations should be taken of this possibility, the following considerations should be taken
note of: note of:
o When DNS is used to convert server names to addresses and DNSSEC o When DNS is used to convert server names to addresses and DNSSEC
[29] is not available, the validity of the network addresses [29] is not available, the validity of the network addresses
returned cannot be relied upon. However, when the client uses returned generally cannot be relied upon. However, when combined
RPCSEC_GSS to access the designated server, it is possible for with a trusted resolver, DNS over TLS [30], and DNS over HTTPS
mutual authentication to discover invalid server addresses [34] can also be relied upon to provide valid address resolutions.
provided, as long as the RPCSEC_GSS implementation used does not
use insecure DNS queries to canonicalize the hostname components In situations in which the validity of the provided addresses
of the service principal names, as explained in [28]. cannot be relied upon and the client uses RPCSEC_GSS to access the
designated server, it is possible for mutual authentication to
discover invalid server addresses as long as the RPCSEC_GSS
implementation used does not use insecure DNS queries to
canonicalize the hostname components of the service principal
names, as explained in [28].
o The fetching of attributes containing file system location o The fetching of attributes containing file system location
information SHOULD be performed using RPCSEC_GSS with integrity information SHOULD be performed using integrity protection. It is
protection. It is important to note here that a client making a important to note here that a client making a request of this sort
request of this sort without using RPCSEC_GSS including integrity without using integrity protection needs be aware of the negative
protection needs be aware of the negative consequences of doing consequences of doing so, which can lead to invalid host names or
so, which can lead to invalid host names or network addresses network addresses being returned. These include cases in which
being returned. These include cases in which the client is the client is directed to a server under the control of an
directed a server under the control of an attacker, who might get attacker, who might get access to data written or provide
access to data written or provide incorrect values for data read. incorrect values for data read. In light of this, the client
In light of this, the client needs to recognize that using such needs to recognize that using such returned location information
returned location information to access an NFSv4 server without to access an NFSv4 server without use of RPCSEC_GSS (i.e. by
use of RPCSEC_GSS (i.e. by using AUTH_SYS) poses dangers as it using AUTH_SYS) poses dangers as it can result in the client
can result in the client interacting with such an attacker- interacting with such an attacker-controlled server, without any
controlled server, without any authentication facilities to verify authentication facilities to verify the server's identity.
the server's identity.
o Despite the fact that it is a requirement that "implementations" o Despite the fact that it is a requirement that implementations
provide "support" for use of RPCSEC_GSS, it cannot be assumed that provide "support" for use of RPCSEC_GSS, it cannot be assumed that
use of RPCSEC_GSS is always available between any particular use of RPCSEC_GSS is always available between any particular
client-server pair. client-server pair.
o When a client has the network addresses of a server but not the o When a client has the network addresses of a server but not the
associated host names, that would interfere with its ability to associated host names, that would interfere with its ability to
use RPCSEC_GSS. use RPCSEC_GSS.
In light of the above, a server SHOULD present file system location In light of the above, a server SHOULD present file system location
entries that correspond to file systems on other servers using a host entries that correspond to file systems on other servers using a host
name. This would allow the client to interrogate the fs_locations on name. This would allow the client to interrogate the fs_locations on
the destination server to obtain trunking information (as well as the destination server to obtain trunking information (as well as
replica information) using RPCSEC_GSS with integrity, validating the replica information) using integrity protection, validating the name
name provided while assuring that the response has not been modified provided while assuring that the response has not been modified in
in flight. flight.
When RPCSEC_GSS is not available on a server, the client needs to be When RPCSEC_GSS is not available on a server, the client needs to be
aware of the fact that the location entries are subject to aware of the fact that the location entries are subject to
modification in flight and so cannot be relied upon. In the case of modification in flight and so cannot be relied upon. In the case of
a client being directed to another server after NFS4ERR_MOVED, this a client being directed to another server after NFS4ERR_MOVED, this
could vitiate the authentication provided by the use of RPCSEC_GSS on could vitiate the authentication provided by the use of RPCSEC_GSS on
the destination. Even when RPCSEC_GSS authentication is available on the designated destination server. Even when RPCSEC_GSS
the destination, the server might validly represent itself as the authentication is available on the destination, the server might
server to which the client was erroneously directed. Without a way still properly authenticate as the server to which the client was
to decide whether the server is a valid one, the client can only erroneously directed. Without a way to decide whether the server is
determine, using RPCSEC_GSS, that the server corresponds to the name a valid one, the client can only determine, using RPCSEC_GSS, that
provided, with no basis for trusting that server. As a result, the the server corresponds to the name provided, with no basis for
client SHOULD NOT use such unverified location entries as a basis for trusting that server. As a result, the client SHOULD NOT use such
migration, even though RPCSEC_GSS might be available on the unverified location entries as a basis for migration, even though
destination. RPCSEC_GSS might be available on the destination.
When a file system location attribute is fetched upon connecting with When a file system location attribute is fetched upon connecting with
an NFS server, it SHOULD, as stated above, be done using RPCSEC_GSS an NFS server, it SHOULD, as stated above, be done with integrity
with integrity protection. When this not possible, it is generally protection. When this not possible, it is generally best for the
best for the client to ignore trunking and replica information or client to ignore trunking and replica information or simply not fetch
simply not fetch the location information for these purposes. the location information for these purposes.
When location information cannot be verified, it can be subjected to When location information cannot be verified, it can be subjected to
additional filtering to prevent the client from being inappropriately additional filtering to prevent the client from being inappropriately
directed. For example, if a range of network addresses can be directed. For example, if a range of network addresses can be
determined that assure that the servers and clients using AUTH_SYS determined that assure that the servers and clients using AUTH_SYS
are subject to the appropriate set of constraints (e.g. physical are subject to the appropriate set of constraints (e.g. physical
network isolation, administrative controls on the operating systems network isolation, administrative controls on the operating systems
used), then network addresses in the appropriate range can be used used), then network addresses in the appropriate range can be used
with others discarded or restricted in their use of AUTH_SYS. with others discarded or restricted in their use of AUTH_SYS.
To summarize considerations regarding the use of RPCSEC_GSS in To summarize considerations regarding the use of RPCSEC_GSS in
fetching location information, we need to consider the following fetching location information, we need to consider the following
possibilities for requests to interrogate location information, with possibilities for requests to interrogate location information, with
interrogation approaches on the referring and destination servers interrogation approaches on the referring and destination servers
arrived at separately: arrived at separately:
o The use of RPCSEC_GSS with integrity protection is RECOMMENDED in o The use of integrity protection is RECOMMENDED in all cases, since
all cases, since the absence of integrity protection exposes the the absence of integrity protection exposes the client to the
client to the possibility of the results being modified in possibility of the results being modified in transit.
transit.
o The use of requests issued without RPCSEC_GSS (i.e. using AUTH_SYS o The use of requests issued without RPCSEC_GSS (i.e. using AUTH_SYS
which has no provision to avoid modification of data in flight), which has no provision to avoid modification of data in flight),
while undesirable and a potential security exposure, may not be while undesirable and a potential security exposure, may not be
avoidable in all cases. Where the use of the returned information avoidable in all cases. Where the use of the returned information
cannot be avoided, it is made subject to filtering as described cannot be avoided, it is made subject to filtering as described
above to eliminate the possibility that the client would treat an above to eliminate the possibility that the client would treat an
invalid address as if it were a NFSv4 server. The specifics will invalid address as if it were a NFSv4 server. The specifics will
vary depending on the degree of network isolation and whether the vary depending on the degree of network isolation and whether the
request is to the referring or destination servers. request is to the referring or destination servers.
Even if such requests are not interfered with in flight, it is
possible for a compromised server to direct the client to use
inappropriate servers, such as those under the control of the
attacker. It is not clear that being directed to such servers
represents a greater threat to the client than the damage that could
be done by the compromised server itself. However, it is possible
that some sorts of transient server compromises might be taken
advantage of to direct a client to a server capable of doing greater
damage over a longer time. One useful step to guard against this
possibility is to issue requests to fetch location data using
RPCSEC_GSS, even if no mapping to an RPCSEC_GSS principal is
available. In this case, RPCSEC_GSS would not be used, as it
typically is, to identify the client principal to the server, but
rather to make sure (via RPCSEC_GSS mutual authentication) that the
server being contacted is the one intended.
Similar considerations apply if the threat to be avoided is the
redirection of client traffic to inappropriate (i.e. poorly
performing) servers. In both cases, there is no reason for the
information returned to depend on the identity of the client
principal requesting it, while the validity of the server
information, which has the capability to affect all client
principals, is of considerable importance.
22. IANA Considerations 22. IANA Considerations
This section uses terms that are defined in [58]. This section uses terms that are defined in [62].
22.1. IANA Actions Neeeded 22.1. IANA Actions Needed
This update does not require any modification of or additions to This update does not require any modification of or additions to
registry entries or registry rules associated with NFSv4.1. However, registry entries or registry rules associated with NFSv4.1. However,
since this document is intended to obsolete RFC5661, it will be since this document is intended to obsolete RFC5661, it will be
necessary for IANA to update all registry entries and registry rules necessary for IANA to update all registry entries and registry rules
references that points to RFC5661 to point to this document instead. references that points to RFC5661 to point to this document instead.
Previous actions by IANA related to NFSv4.1 are listed in the Previous actions by IANA related to NFSv4.1 are listed in the
remaining subsections of Section 22. remaining subsections of Section 22.
skipping to change at page 631, line 47 skipping to change at page 641, line 37
attributes as needed, they are encouraged to register the attributes attributes as needed, they are encouraged to register the attributes
with IANA. with IANA.
Such registered named attributes are presumed to apply to all minor Such registered named attributes are presumed to apply to all minor
versions of NFSv4, including those defined subsequently to the versions of NFSv4, including those defined subsequently to the
registration. If the named attribute is intended to be limited to registration. If the named attribute is intended to be limited to
specific minor versions, this will be clearly stated in the specific minor versions, this will be clearly stated in the
registry's assignment. registry's assignment.
All assignments to the registry are made on a First Come First Served All assignments to the registry are made on a First Come First Served
basis, per Section 4.1 of [58]. The policy for each assignment is basis, per Section 4.1 of [62]. The policy for each assignment is
Specification Required, per Section 4.1 of [58]. Specification Required, per Section 4.1 of [62].
Under the NFSv4.1 specification, the name of a named attribute can in Under the NFSv4.1 specification, the name of a named attribute can in
theory be up to 2^32 - 1 bytes in length, but in practice NFSv4.1 theory be up to 2^32 - 1 bytes in length, but in practice NFSv4.1
clients and servers will be unable to handle a string that long. clients and servers will be unable to handle a string that long.
IANA should reject any assignment request with a named attribute that IANA should reject any assignment request with a named attribute that
exceeds 128 UTF-8 characters. To give the IESG the flexibility to exceeds 128 UTF-8 characters. To give the IESG the flexibility to
set up bases of assignment of Experimental Use and Standards Action, set up bases of assignment of Experimental Use and Standards Action,
the prefixes of "EXPE" and "STDS" are Reserved. The named attribute the prefixes of "EXPE" and "STDS" are Reserved. The named attribute
with a zero-length name is Reserved. with a zero-length name is Reserved.
skipping to change at page 633, line 11 skipping to change at page 642, line 48
The potential exists for new notification types to be added to the The potential exists for new notification types to be added to the
CB_NOTIFY_DEVICEID operation (see Section 20.12). This can be done CB_NOTIFY_DEVICEID operation (see Section 20.12). This can be done
via changes to the operations that register notifications, or by via changes to the operations that register notifications, or by
adding new operations to NFSv4. This requires a new minor version of adding new operations to NFSv4. This requires a new minor version of
NFSv4, and requires a Standards Track document from the IETF. NFSv4, and requires a Standards Track document from the IETF.
Another way to add a notification is to specify a new layout type Another way to add a notification is to specify a new layout type
(see Section 22.5). (see Section 22.5).
Hence, all assignments to the registry are made on a Standards Action Hence, all assignments to the registry are made on a Standards Action
basis per Section 4.1 of [58], with Expert Review required. basis per Section 4.1 of [62], with Expert Review required.
The registry is a list of assignments, each containing five fields The registry is a list of assignments, each containing five fields
per assignment. per assignment.