draft-ietf-nfsv4-rfc5661sesqui-msns-04.txt   rfc8881.txt 
NFSv4 D. Noveck, Ed. Internet Engineering Task Force (IETF) D. Noveck, Ed.
Internet-Draft NetApp Request for Comments: 8881 NetApp
Obsoletes: 5661 (if approved) C. Lever Obsoletes: 5661 C. Lever
Intended status: Standards Track ORACLE Category: Standards Track ORACLE
Expires: July 31, 2020 January 28, 2020 ISSN: 2070-1721 August 2020
Network File System (NFS) Version 4 Minor Version 1 Protocol Network File System (NFS) Version 4 Minor Version 1 Protocol
draft-ietf-nfsv4-rfc5661sesqui-msns-04
Abstract Abstract
This document describes the Network File System (NFS) version 4 minor This document describes the Network File System (NFS) version 4 minor
version 1, including features retained from the base protocol (NFS version 1, including features retained from the base protocol (NFS
version 4 minor version 0, which is specified in RFC 7530) and version 4 minor version 0, which is specified in RFC 7530) and
protocol extensions made subsequently. The later minor version has protocol extensions made subsequently. The later minor version has
no dependencies on NFS version 4 minor version 0, and is considered a no dependencies on NFS version 4 minor version 0, and is considered a
separate protocol. separate protocol.
This document obsoletes RFC5661. It substantially revises the This document obsoletes RFC 5661. It substantially revises the
treatment of features relating to multi-server namespace, superseding treatment of features relating to multi-server namespace, superseding
the description of those features appearing in RFC5661. the description of those features appearing in RFC 5661.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This is an Internet Standards Track document.
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months This document is a product of the Internet Engineering Task Force
and may be updated, replaced, or obsoleted by other documents at any (IETF). It represents the consensus of the IETF community. It has
time. It is inappropriate to use Internet-Drafts as reference received public review and has been approved for publication by the
material or to cite them other than as "work in progress." Internet Engineering Steering Group (IESG). Further information on
Internet Standards is available in Section 2 of RFC 7841.
This Internet-Draft will expire on July 31, 2020. Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
https://www.rfc-editor.org/info/rfc8881.
Copyright Notice Copyright Notice
Copyright (c) 2020 IETF Trust and the persons identified as the Copyright (c) 2020 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 25 skipping to change at line 66
modifications of such material outside the IETF Standards Process. modifications of such material outside the IETF Standards Process.
Without obtaining an adequate license from the person(s) controlling Without obtaining an adequate license from the person(s) controlling
the copyright in such materials, this document may not be modified the copyright in such materials, this document may not be modified
outside the IETF Standards Process, and derivative works of it may outside the IETF Standards Process, and derivative works of it may
not be created outside the IETF Standards Process, except to format not be created outside the IETF Standards Process, except to format
it for publication as an RFC or to translate it into languages other it for publication as an RFC or to translate it into languages other
than English. than English.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 7 1. Introduction
1.1. Introduction to this Update . . . . . . . . . . . . . . . 7 1.1. Introduction to This Update
1.2. The NFS Version 4 Minor Version 1 Protocol . . . . . . . 9 1.2. The NFS Version 4 Minor Version 1 Protocol
1.3. Requirements Language . . . . . . . . . . . . . . . . . . 10 1.3. Requirements Language
1.4. Scope of This Document . . . . . . . . . . . . . . . . . 10 1.4. Scope of This Document
1.5. NFSv4 Goals . . . . . . . . . . . . . . . . . . . . . . . 10 1.5. NFSv4 Goals
1.6. NFSv4.1 Goals . . . . . . . . . . . . . . . . . . . . . . 11 1.6. NFSv4.1 Goals
1.7. General Definitions . . . . . . . . . . . . . . . . . . . 11 1.7. General Definitions
1.8. Overview of NFSv4.1 Features . . . . . . . . . . . . . . 14 1.8. Overview of NFSv4.1 Features
1.9. Differences from NFSv4.0 . . . . . . . . . . . . . . . . 18 1.9. Differences from NFSv4.0
2. Core Infrastructure . . . . . . . . . . . . . . . . . . . . . 19 2. Core Infrastructure
2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 19 2.1. Introduction
2.2. RPC and XDR . . . . . . . . . . . . . . . . . . . . . . . 19 2.2. RPC and XDR
2.3. COMPOUND and CB_COMPOUND . . . . . . . . . . . . . . . . 22 2.3. COMPOUND and CB_COMPOUND
2.4. Client Identifiers and Client Owners . . . . . . . . . . 23 2.4. Client Identifiers and Client Owners
2.5. Server Owners . . . . . . . . . . . . . . . . . . . . . . 29 2.5. Server Owners
2.6. Security Service Negotiation . . . . . . . . . . . . . . 29 2.6. Security Service Negotiation
2.7. Minor Versioning . . . . . . . . . . . . . . . . . . . . 35 2.7. Minor Versioning
2.8. Non-RPC-Based Security Services . . . . . . . . . . . . . 37 2.8. Non-RPC-Based Security Services
2.9. Transport Layers . . . . . . . . . . . . . . . . . . . . 38 2.9. Transport Layers
2.10. Session . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.10. Session
3. Protocol Constants and Data Types . . . . . . . . . . . . . . 87 3. Protocol Constants and Data Types
3.1. Basic Constants . . . . . . . . . . . . . . . . . . . . . 87 3.1. Basic Constants
3.2. Basic Data Types . . . . . . . . . . . . . . . . . . . . 88 3.2. Basic Data Types
3.3. Structured Data Types . . . . . . . . . . . . . . . . . . 90 3.3. Structured Data Types
4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . . . 98 4. Filehandles
4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 99 4.1. Obtaining the First Filehandle
4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 100 4.2. Filehandle Types
4.3. One Method of Constructing a Volatile Filehandle . . . . 102 4.3. One Method of Constructing a Volatile Filehandle
4.4. Client Recovery from Filehandle Expiration . . . . . . . 103 4.4. Client Recovery from Filehandle Expiration
5. File Attributes . . . . . . . . . . . . . . . . . . . . . . . 104 5. File Attributes
5.1. REQUIRED Attributes . . . . . . . . . . . . . . . . . . . 105 5.1. REQUIRED Attributes
5.2. RECOMMENDED Attributes . . . . . . . . . . . . . . . . . 105 5.2. RECOMMENDED Attributes
5.3. Named Attributes . . . . . . . . . . . . . . . . . . . . 106 5.3. Named Attributes
5.4. Classification of Attributes . . . . . . . . . . . . . . 107 5.4. Classification of Attributes
5.5. Set-Only and Get-Only Attributes . . . . . . . . . . . . 108 5.5. Set-Only and Get-Only Attributes
5.6. REQUIRED Attributes - List and Definition References . . 108 5.6. REQUIRED Attributes - List and Definition References
5.7. RECOMMENDED Attributes - List and Definition References . 109 5.7. RECOMMENDED Attributes - List and Definition References
5.8. Attribute Definitions . . . . . . . . . . . . . . 111 5.8. Attribute Definitions
5.9. Interpreting owner and owner_group . . . . . . . . . . . 120 5.9. Interpreting owner and owner_group
5.10. Character Case Attributes . . . . . . . . . . . . . . . . 122 5.10. Character Case Attributes
5.11. Directory Notification Attributes . . . . . . . . . . . . 122 5.11. Directory Notification Attributes
5.12. pNFS Attribute Definitions . . . . . . . . . . . . . . . 123 5.12. pNFS Attribute Definitions
5.13. Retention Attributes . . . . . . . . . . . . . . . . . . 124 5.13. Retention Attributes
6. Access Control Attributes . . . . . . . . . . . . . . . . . . 127 6. Access Control Attributes
6.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 127 6.1. Goals
6.2. File Attributes Discussion . . . . . . . . . . . . . . . 128 6.2. File Attributes Discussion
6.3. Common Methods . . . . . . . . . . . . . . . . . . . . . 145 6.3. Common Methods
6.4. Requirements . . . . . . . . . . . . . . . . . . . . . . 147 6.4. Requirements
7. Single-Server Namespace . . . . . . . . . . . . . . . . . . . 154 7. Single-Server Namespace
7.1. Server Exports . . . . . . . . . . . . . . . . . . . . . 154 7.1. Server Exports
7.2. Browsing Exports . . . . . . . . . . . . . . . . . . . . 154 7.2. Browsing Exports
7.3. Server Pseudo File System . . . . . . . . . . . . . . . . 155 7.3. Server Pseudo File System
7.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . 155 7.4. Multiple Roots
7.5. Filehandle Volatility . . . . . . . . . . . . . . . . . . 156 7.5. Filehandle Volatility
7.6. Exported Root . . . . . . . . . . . . . . . . . . . . . . 156 7.6. Exported Root
7.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . 156 7.7. Mount Point Crossing
7.8. Security Policy and Namespace Presentation . . . . . . . 157 7.8. Security Policy and Namespace Presentation
8. State Management . . . . . . . . . . . . . . . . . . . . . . 158 8. State Management
8.1. Client and Session ID . . . . . . . . . . . . . . . . . . 159 8.1. Client and Session ID
8.2. Stateid Definition . . . . . . . . . . . . . . . . . . . 159 8.2. Stateid Definition
8.3. Lease Renewal . . . . . . . . . . . . . . . . . . . . . . 168 8.3. Lease Renewal
8.4. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 170 8.4. Crash Recovery
8.5. Server Revocation of Locks . . . . . . . . . . . . . . . 181 8.5. Server Revocation of Locks
8.6. Short and Long Leases . . . . . . . . . . . . . . . . . . 182 8.6. Short and Long Leases
8.7. Clocks, Propagation Delay, and Calculating Lease 8.7. Clocks, Propagation Delay, and Calculating Lease Expiration
Expiration . . . . . . . . . . . . . . . . . . . . . . . 183 8.8. Obsolete Locking Infrastructure from NFSv4.0
8.8. Obsolete Locking Infrastructure from NFSv4.0 . . . . . . 183 9. File Locking and Share Reservations
9. File Locking and Share Reservations . . . . . . . . . . . . . 184 9.1. Opens and Byte-Range Locks
9.1. Opens and Byte-Range Locks . . . . . . . . . . . . . . . 184 9.2. Lock Ranges
9.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . . 188 9.3. Upgrading and Downgrading Locks
9.3. Upgrading and Downgrading Locks . . . . . . . . . . . . . 189 9.4. Stateid Seqid Values and Byte-Range Locks
9.4. Stateid Seqid Values and Byte-Range Locks . . . . . . . . 189 9.5. Issues with Multiple Open-Owners
9.5. Issues with Multiple Open-Owners . . . . . . . . . . . . 189 9.6. Blocking Locks
9.6. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 190 9.7. Share Reservations
9.7. Share Reservations . . . . . . . . . . . . . . . . . . . 191 9.8. OPEN/CLOSE Operations
9.8. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . . 192 9.9. Open Upgrade and Downgrade
9.9. Open Upgrade and Downgrade . . . . . . . . . . . . . . . 193 9.10. Parallel OPENs
9.10. Parallel OPENs . . . . . . . . . . . . . . . . . . . . . 194 9.11. Reclaim of Open and Byte-Range Locks
9.11. Reclaim of Open and Byte-Range Locks . . . . . . . . . . 194 10. Client-Side Caching
10. Client-Side Caching . . . . . . . . . . . . . . . . . . . . . 195 10.1. Performance Challenges for Client-Side Caching
10.1. Performance Challenges for Client-Side Caching . . . . . 195 10.2. Delegation and Callbacks
10.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 196 10.3. Data Caching
10.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 201 10.4. Open Delegation
10.4. Open Delegation . . . . . . . . . . . . . . . . . . . . 205 10.5. Data Caching and Revocation
10.5. Data Caching and Revocation . . . . . . . . . . . . . . 216 10.6. Attribute Caching
10.6. Attribute Caching . . . . . . . . . . . . . . . . . . . 218 10.7. Data and Metadata Caching and Memory Mapped Files
10.7. Data and Metadata Caching and Memory Mapped Files . . . 220 10.8. Name and Directory Caching without Directory Delegations
10.8. Name and Directory Caching without Directory Delegations 222 10.9. Directory Delegations
10.9. Directory Delegations . . . . . . . . . . . . . . . . . 224 11. Multi-Server Namespace
11. Multi-Server Namespace . . . . . . . . . . . . . . . . . . . 228 11.1. Terminology
11.1. Terminology . . . . . . . . . . . . . . . . . . . . . . 228 11.2. File System Location Attributes
11.2. File System Location Attributes . . . . . . . . . . . . 232 11.3. File System Presence or Absence
11.3. File System Presence or Absence . . . . . . . . . . . . 233 11.4. Getting Attributes for an Absent File System
11.4. Getting Attributes for an Absent File System . . . . . . 234 11.5. Uses of File System Location Information
11.5. Uses of File System Location Information . . . . . . . . 236 11.6. Trunking without File System Location Information
11.6. Trunking without File System Location Information . . . 246 11.7. Users and Groups in a Multi-Server Namespace
11.7. Users and Groups in a Multi-server Namespace . . . . . . 246 11.8. Additional Client-Side Considerations
11.8. Additional Client-Side Considerations . . . . . . . . . 248 11.9. Overview of File Access Transitions
11.9. Overview of File Access Transitions . . . . . . . . . . 248 11.10. Effecting Network Endpoint Transitions
11.10. Effecting Network Endpoint Transitions . . . . . . . . . 249 11.11. Effecting File System Transitions
11.11. Effecting File System Transitions . . . . . . . . . . . 250 11.12. Transferring State upon Migration
11.12. Transferring State upon Migration . . . . . . . . . . . 260 11.13. Client Responsibilities When Access Is Transitioned
11.13. Client Responsibilities when Access is Transitioned . . 261 11.14. Server Responsibilities Upon Migration
11.14. Server Responsibilities Upon Migration . . . . . . . . . 271 11.15. Effecting File System Referrals
11.15. Effecting File System Referrals . . . . . . . . . . . . 277 11.16. The Attribute fs_locations
11.16. The Attribute fs_locations . . . . . . . . . . . . . . . 284 11.17. The Attribute fs_locations_info
11.17. The Attribute fs_locations_info . . . . . . . . . . . . 287 11.18. The Attribute fs_status
11.18. The Attribute fs_status . . . . . . . . . . . . . . . . 300 12. Parallel NFS (pNFS)
12. Parallel NFS (pNFS) . . . . . . . . . . . . . . . . . . . . . 304 12.1. Introduction
12.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 304 12.2. pNFS Definitions
12.2. pNFS Definitions . . . . . . . . . . . . . . . . . . . . 305 12.3. pNFS Operations
12.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 311 12.4. pNFS Attributes
12.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 312 12.5. Layout Semantics
12.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 312 12.6. pNFS Mechanics
12.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 327 12.7. Recovery
12.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 329 12.8. Metadata and Storage Device Roles
12.8. Metadata and Storage Device Roles . . . . . . . . . . . 334 12.9. Security Considerations for pNFS
12.9. Security Considerations for pNFS . . . . . . . . . . . . 334 13. NFSv4.1 as a Storage Protocol in pNFS: the File Layout Type
13. NFSv4.1 as a Storage Protocol in pNFS: the File Layout Type . 336 13.1. Client ID and Session Considerations
13.1. Client ID and Session Considerations . . . . . . . . . . 336 13.2. File Layout Definitions
13.2. File Layout Definitions . . . . . . . . . . . . . . . . 339 13.3. File Layout Data Types
13.3. File Layout Data Types . . . . . . . . . . . . . . . . . 339 13.4. Interpreting the File Layout
13.4. Interpreting the File Layout . . . . . . . . . . . . . . 343 13.5. Data Server Multipathing
13.5. Data Server Multipathing . . . . . . . . . . . . . . . . 351 13.6. Operations Sent to NFSv4.1 Data Servers
13.6. Operations Sent to NFSv4.1 Data Servers . . . . . . . . 352 13.7. COMMIT through Metadata Server
13.7. COMMIT through Metadata Server . . . . . . . . . . . . . 354 13.8. The Layout Iomode
13.8. The Layout Iomode . . . . . . . . . . . . . . . . . . . 355 13.9. Metadata and Data Server State Coordination
13.9. Metadata and Data Server State Coordination . . . . . . 356 13.10. Data Server Component File Size
13.10. Data Server Component File Size . . . . . . . . . . . . 359 13.11. Layout Revocation and Fencing
13.11. Layout Revocation and Fencing . . . . . . . . . . . . . 359 13.12. Security Considerations for the File Layout Type
13.12. Security Considerations for the File Layout Type . . . . 360 14. Internationalization
14. Internationalization . . . . . . . . . . . . . . . . . . . . 361 14.1. Stringprep Profile for the utf8str_cs Type
14.1. Stringprep Profile for the utf8str_cs Type . . . . . . . 362 14.2. Stringprep Profile for the utf8str_cis Type
14.2. Stringprep Profile for the utf8str_cis Type . . . . . . 364 14.3. Stringprep Profile for the utf8str_mixed Type
14.3. Stringprep Profile for the utf8str_mixed Type . . . . . 365 14.4. UTF-8 Capabilities
14.4. UTF-8 Capabilities . . . . . . . . . . . . . . . . . . . 367 14.5. UTF-8 Related Errors
14.5. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 367 15. Error Values
15. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 368 15.1. Error Definitions
15.1. Error Definitions . . . . . . . . . . . . . . . . . . . 368 15.2. Operations and Their Valid Errors
15.2. Operations and Their Valid Errors . . . . . . . . . . . 390 15.3. Callback Operations and Their Valid Errors
15.3. Callback Operations and Their Valid Errors . . . . . . . 406 15.4. Errors and the Operations That Use Them
15.4. Errors and the Operations That Use Them . . . . . . . . 409 16. NFSv4.1 Procedures
16. NFSv4.1 Procedures . . . . . . . . . . . . . . . . . . . . . 423 16.1. Procedure 0: NULL - No Operation
16.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 423 16.2. Procedure 1: COMPOUND - Compound Operations
16.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 424 17. Operations: REQUIRED, RECOMMENDED, or OPTIONAL
17. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . 435 18. NFSv4.1 Operations
18. NFSv4.1 Operations . . . . . . . . . . . . . . . . . . . . . 438 18.1. Operation 3: ACCESS - Check Access Rights
18.1. Operation 3: ACCESS - Check Access Rights . . . . . . . 438 18.2. Operation 4: CLOSE - Close File
18.2. Operation 4: CLOSE - Close File . . . . . . . . . . . . 444 18.3. Operation 5: COMMIT - Commit Cached Data
18.3. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 445 18.4. Operation 6: CREATE - Create a Non-Regular File Object
18.4. Operation 6: CREATE - Create a Non-Regular File Object . 448
18.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting 18.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting
Recovery . . . . . . . . . . . . . . . . . . . . . . . . 451 Recovery
18.6. Operation 8: DELEGRETURN - Return Delegation . . . . . . 452 18.6. Operation 8: DELEGRETURN - Return Delegation
18.7. Operation 9: GETATTR - Get Attributes . . . . . . . . . 452 18.7. Operation 9: GETATTR - Get Attributes
18.8. Operation 10: GETFH - Get Current Filehandle . . . . . . 454 18.8. Operation 10: GETFH - Get Current Filehandle
18.9. Operation 11: LINK - Create Link to a File . . . . . . . 455 18.9. Operation 11: LINK - Create Link to a File
18.10. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 458 18.10. Operation 12: LOCK - Create Lock
18.11. Operation 13: LOCKT - Test for Lock . . . . . . . . . . 463 18.11. Operation 13: LOCKT - Test for Lock
18.12. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 464 18.12. Operation 14: LOCKU - Unlock File
18.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 466 18.13. Operation 15: LOOKUP - Lookup Filename
18.14. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 468 18.14. Operation 16: LOOKUPP - Lookup Parent Directory
18.15. Operation 17: NVERIFY - Verify Difference in Attributes 469 18.15. Operation 17: NVERIFY - Verify Difference in Attributes
18.16. Operation 18: OPEN - Open a Regular File . . . . . . . . 470 18.16. Operation 18: OPEN - Open a Regular File
18.17. Operation 19: OPENATTR - Open Named Attribute Directory 490 18.17. Operation 19: OPENATTR - Open Named Attribute Directory
18.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 492 18.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access
18.19. Operation 22: PUTFH - Set Current Filehandle . . . . . . 493 18.19. Operation 22: PUTFH - Set Current Filehandle
18.20. Operation 23: PUTPUBFH - Set Public Filehandle . . . . 494 18.20. Operation 23: PUTPUBFH - Set Public Filehandle
18.21. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 496 18.21. Operation 24: PUTROOTFH - Set Root Filehandle
18.22. Operation 25: READ - Read from File . . . . . . . . . . 497 18.22. Operation 25: READ - Read from File
18.23. Operation 26: READDIR - Read Directory . . . . . . . . . 499 18.23. Operation 26: READDIR - Read Directory
18.24. Operation 27: READLINK - Read Symbolic Link . . . . . . 503 18.24. Operation 27: READLINK - Read Symbolic Link
18.25. Operation 28: REMOVE - Remove File System Object . . . . 504 18.25. Operation 28: REMOVE - Remove File System Object
18.26. Operation 29: RENAME - Rename Directory Entry . . . . . 507 18.26. Operation 29: RENAME - Rename Directory Entry
18.27. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 510 18.27. Operation 31: RESTOREFH - Restore Saved Filehandle
18.28. Operation 32: SAVEFH - Save Current Filehandle . . . . . 511 18.28. Operation 32: SAVEFH - Save Current Filehandle
18.29. Operation 33: SECINFO - Obtain Available Security . . . 512 18.29. Operation 33: SECINFO - Obtain Available Security
18.30. Operation 34: SETATTR - Set Attributes . . . . . . . . . 516 18.30. Operation 34: SETATTR - Set Attributes
18.31. Operation 37: VERIFY - Verify Same Attributes . . . . . 519 18.31. Operation 37: VERIFY - Verify Same Attributes
18.32. Operation 38: WRITE - Write to File . . . . . . . . . . 520 18.32. Operation 38: WRITE - Write to File
18.33. Operation 40: BACKCHANNEL_CTL - Backchannel Control . . 525 18.33. Operation 40: BACKCHANNEL_CTL - Backchannel Control
18.34. Operation 41: BIND_CONN_TO_SESSION - Associate 18.34. Operation 41: BIND_CONN_TO_SESSION - Associate Connection
Connection with Session . . . . . . . . . . . . . . . . 526 with Session
18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 529 18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID
18.36. Operation 43: CREATE_SESSION - Create New Session and 18.36. Operation 43: CREATE_SESSION - Create New Session and
Confirm Client ID . . . . . . . . . . . . . . . . . . . 548 Confirm Client ID
18.37. Operation 44: DESTROY_SESSION - Destroy a Session . . . 558 18.37. Operation 44: DESTROY_SESSION - Destroy a Session
18.38. Operation 45: FREE_STATEID - Free Stateid with No Locks 560 18.38. Operation 45: FREE_STATEID - Free Stateid with No Locks
18.39. Operation 46: GET_DIR_DELEGATION - Get a Directory 18.39. Operation 46: GET_DIR_DELEGATION - Get a Directory
Delegation . . . . . . . . . . . . . . . . . . . . . . . 561 Delegation
18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 565 18.40. Operation 47: GETDEVICEINFO - Get Device Information
18.41. Operation 48: GETDEVICELIST - Get All Device Mappings 18.41. Operation 48: GETDEVICELIST - Get All Device Mappings for
for a File System . . . . . . . . . . . . . . . . . . . 568 a File System
18.42. Operation 49: LAYOUTCOMMIT - Commit Writes Made Using a 18.42. Operation 49: LAYOUTCOMMIT - Commit Writes Made Using a
Layout . . . . . . . . . . . . . . . . . . . . . . . . . 569 Layout
18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 573 18.43. Operation 50: LAYOUTGET - Get Layout Information
18.44. Operation 51: LAYOUTRETURN - Release Layout Information 583 18.44. Operation 51: LAYOUTRETURN - Release Layout Information
18.45. Operation 52: SECINFO_NO_NAME - Get Security on Unnamed 18.45. Operation 52: SECINFO_NO_NAME - Get Security on Unnamed
Object . . . . . . . . . . . . . . . . . . . . . . . . . 588 Object
18.46. Operation 53: SEQUENCE - Supply Per-Procedure Sequencing 18.46. Operation 53: SEQUENCE - Supply Per-Procedure Sequencing
and Control . . . . . . . . . . . . . . . . . . . . . . 589 and Control
18.47. Operation 54: SET_SSV - Update SSV for a Client ID . . . 595 18.47. Operation 54: SET_SSV - Update SSV for a Client ID
18.48. Operation 55: TEST_STATEID - Test Stateids for Validity 597 18.48. Operation 55: TEST_STATEID - Test Stateids for Validity
18.49. Operation 56: WANT_DELEGATION - Request Delegation . . . 599 18.49. Operation 56: WANT_DELEGATION - Request Delegation
18.50. Operation 57: DESTROY_CLIENTID - Destroy a Client ID . . 603 18.50. Operation 57: DESTROY_CLIENTID - Destroy a Client ID
18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims 18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims
Finished . . . . . . . . . . . . . . . . . . . . . . . . 604 Finished
18.52. Operation 10044: ILLEGAL - Illegal Operation . . . . . . 607 18.52. Operation 10044: ILLEGAL - Illegal Operation
19. NFSv4.1 Callback Procedures . . . . . . . . . . . . . . . . . 608 19. NFSv4.1 Callback Procedures
19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 608 19.1. Procedure 0: CB_NULL - No Operation
19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 608 19.2. Procedure 1: CB_COMPOUND - Compound Operations
20. NFSv4.1 Callback Operations . . . . . . . . . . . . . . . . . 613 20. NFSv4.1 Callback Operations
20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 613 20.1. Operation 3: CB_GETATTR - Get Attributes
20.2. Operation 4: CB_RECALL - Recall a Delegation . . . . . . 614 20.2. Operation 4: CB_RECALL - Recall a Delegation
20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from Client 615 20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from Client
20.4. Operation 6: CB_NOTIFY - Notify Client of Directory 20.4. Operation 6: CB_NOTIFY - Notify Client of Directory
Changes . . . . . . . . . . . . . . . . . . . . . . . . 618 Changes
20.5. Operation 7: CB_PUSH_DELEG - Offer Previously Requested 20.5. Operation 7: CB_PUSH_DELEG - Offer Previously Requested
Delegation to Client . . . . . . . . . . . . . . . . . . 622 Delegation to Client
20.6. Operation 8: CB_RECALL_ANY - Keep Any N Recallable 20.6. Operation 8: CB_RECALL_ANY - Keep Any N Recallable Objects
Objects . . . . . . . . . . . . . . . . . . . . . . . . 623
20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal Resources 20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal Resources
for Recallable Objects . . . . . . . . . . . . . . . . . 626 for Recallable Objects
20.8. Operation 10: CB_RECALL_SLOT - Change Flow Control 20.8. Operation 10: CB_RECALL_SLOT - Change Flow Control Limits
Limits . . . . . . . . . . . . . . . . . . . . . . . . . 627 20.9. Operation 11: CB_SEQUENCE - Supply Backchannel Sequencing
20.9. Operation 11: CB_SEQUENCE - Supply Backchannel and Control
Sequencing and Control . . . . . . . . . . . . . . . . . 628
20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending 20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending
Delegation Wants . . . . . . . . . . . . . . . . . . . . 631 Delegation Wants
20.11. Operation 13: CB_NOTIFY_LOCK - Notify Client of Possible 20.11. Operation 13: CB_NOTIFY_LOCK - Notify Client of Possible
Lock Availability . . . . . . . . . . . . . . . . . . . 632 Lock Availability
20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify Client of 20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify Client of Device
Device ID Changes . . . . . . . . . . . . . . . . . . . 633 ID Changes
20.13. Operation 10044: CB_ILLEGAL - Illegal Callback Operation 635 20.13. Operation 10044: CB_ILLEGAL - Illegal Callback Operation
21. Security Considerations . . . . . . . . . . . . . . . . . . . 636 21. Security Considerations
22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 640 22. IANA Considerations
22.1. IANA Actions Needed . . . . . . . . . . . . . . . . . . 641 22.1. IANA Actions
22.2. Named Attribute Definitions . . . . . . . . . . . . . . 641 22.2. Named Attribute Definitions
22.3. Device ID Notifications . . . . . . . . . . . . . . . . 642 22.3. Device ID Notifications
22.4. Object Recall Types . . . . . . . . . . . . . . . . . . 644 22.4. Object Recall Types
22.5. Layout Types . . . . . . . . . . . . . . . . . . . . . . 645 22.5. Layout Types
22.6. Path Variable Definitions . . . . . . . . . . . . . . . 648 22.6. Path Variable Definitions
23. References . . . . . . . . . . . . . . . . . . . . . . . . . 652 23. References
23.1. Normative References . . . . . . . . . . . . . . . . . . 652 23.1. Normative References
23.2. Informative References . . . . . . . . . . . . . . . . . 655 23.2. Informative References
Appendix A. Need for this Update . . . . . . . . . . . . . . . . 659 Appendix A. The Need for This Update
Appendix B. Changes in this Update . . . . . . . . . . . . . . . 661 Appendix B. Changes in This Update
B.1. Revisions Made to Section 11 of RFC5661 . . . . . . . . . 661 B.1. Revisions Made to Section 11 of RFC 5661
B.2. Revisions Made to Operations in RFC5661 . . . . . . . . . 664 B.2. Revisions Made to Operations in RFC 5661
B.3. Revisions Made to Error Definitions in RFC5661 . . . . . 666 B.3. Revisions Made to Error Definitions in RFC 5661
B.4. Other Revisions Made to RFC5661 . . . . . . . . . . . . . 667 B.4. Other Revisions Made to RFC 5661
Appendix C. Security Issues that Need to be Addressed . . . . . 668 Appendix C. Security Issues That Need to Be Addressed
Appendix D. Acknowledgments . . . . . . . . . . . . . . . . . . 670 Acknowledgments
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 673 Authors' Addresses
1. Introduction 1. Introduction
1.1. Introduction to this Update 1.1. Introduction to This Update
Two important features previously defined in minor version 0 but Two important features previously defined in minor version 0 but
never fully addressed in minor version 1 are trunking, the never fully addressed in minor version 1 are trunking, which is the
simultaneous use of multiple connections between a client and server, simultaneous use of multiple connections between a client and server,
potentially to different network addresses, and transparent state potentially to different network addresses, and Transparent State
migration, which allows a file system to be transferred between Migration, which allows a file system to be transferred between
servers in a way that provides to the client the ability to maintain servers in a way that provides to the client the ability to maintain
its existing locking state across the transfer. its existing locking state across the transfer.
The revised description of the NFS version 4 minor version 1 The revised description of the NFS version 4 minor version 1
(NFSv4.1) protocol presented in this update is necessary to enable (NFSv4.1) protocol presented in this update is necessary to enable
full use of these features together with other multi-server namespace full use of these features together with other multi-server namespace
features. This document is in the form of an updated description of features. This document is in the form of an updated description of
the NFSv4.1 protocol previously defined in RFC5661 [65]. RFC5661 is the NFSv4.1 protocol previously defined in RFC 5661 [66]. RFC 5661
obsoleted by this document. However, the update has a limited scope is obsoleted by this document. However, the update has a limited
and is focused on enabling full use of trunking and transparent state scope and is focused on enabling full use of trunking and Transparent
migration. The need for these changes is discussed in Appendix A. State Migration. The need for these changes is discussed in
Appendix B describes the specific changes made to arrive at the Appendix A. Appendix B describes the specific changes made to arrive
current text. at the current text.
This limited-scope update replaces the current NFSv4.1 RFC with the This limited-scope update replaces the current NFSv4.1 RFC with the
intention of providing an authoritative and complete specification, intention of providing an authoritative and complete specification,
the motivation for which is discussed in [35], addressing the issues the motivation for which is discussed in [36], addressing the issues
within the scope of the update. However, it will not address issues within the scope of the update. However, it will not address issues
that are known but outside of this limited scope as could expected by that are known but outside of this limited scope as could be expected
a full update of the protocol. Below are some areas which are known by a full update of the protocol. Below are some areas that are
to need addressing in a future update of the protocol. known to need addressing in a future update of the protocol:
o Work needs to be done with regard to RFC8178 [66] which * Work needs to be done with regard to RFC 8178 [67], which
establishes NFSv4-wide versioning rules. As RFC5661 is currently establishes NFSv4-wide versioning rules. As RFC 5661 is currently
inconsistent with that document, changes are needed in order to inconsistent with that document, changes are needed in order to
arrive at a situation in which there would be no need for RFC8178 arrive at a situation in which there would be no need for RFC 8178
to update the NFSv4.1 specification. to update the NFSv4.1 specification.
o Work needs to be done with regard to RFC8434 [69], which * Work needs to be done with regard to RFC 8434 [70], which
establishes the requirements for pNFS layout types, which are not establishes the requirements for parallel NFS (pNFS) layout types,
clearly defined in RFC5661. When that work is done and the which are not clearly defined in RFC 5661. When that work is done
resulting documents approved, the new NFSv4.1 specification and the resulting documents approved, the new NFSv4.1
document will provide a clear set of requirements for layout types specification document will provide a clear set of requirements
and a description of the file layout type that conforms to those for layout types and a description of the file layout type that
requirements. Other layout types will have their own conforms to those requirements. Other layout types will have
specification documents that conforms to those requirements as their own specification documents that conform to those
well. requirements as well.
o Work needs to be done to address many errata reports relevant to * Work needs to be done to address many errata reports relevant to
RFC 5661, other than errata report 2006 [63], which is addressed RFC 5661, other than errata report 2006 [64], which is addressed
in this document. Addressing that report was not deferrable in this document. Addressing that report was not deferrable
because of the interaction of the changes suggested there and the because of the interaction of the changes suggested there and the
newly described handling of state and session migration. newly described handling of state and session migration.
The errata reports that have been deferred and that will need to The errata reports that have been deferred and that will need to
be addressed in a later document include reports currently be addressed in a later document include reports currently
assigned a range of statuses in the errata reporting system assigned a range of statuses in the errata reporting system,
including reports marked Accepted and those marked Hold For including reports marked Accepted and those marked Hold For
Document Update because the change was too minor to address Document Update because the change was too minor to address
immediately. immediately.
In addition, there is a set of other reports, including at least In addition, there is a set of other reports, including at least
one in state Rejected, which will need to be addressed in a later one in state Rejected, that will need to be addressed in a later
document. This will involve making changes to consensus decisions document. This will involve making changes to consensus decisions
reflected in RFC 5661, in situation in which the working group has reflected in RFC 5661, in situations in which the working group
decided that the treatment in RFC 5661 is incorrect, and needs to has decided that the treatment in RFC 5661 is incorrect and needs
be revised to reflect the working group's new consensus and ensure to be revised to reflect the working group's new consensus and to
compatibility with existing implementations that do not follow the ensure compatibility with existing implementations that do not
handling described in in RFC 5661. follow the handling described in RFC 5661.
Note that it is expected that all such errata reports will remain Note that it is expected that all such errata reports will remain
relevant to implementers and the authors of an eventual relevant to implementors and the authors of an eventual
rfc5661bis, despite the fact that this document, when approved, rfc5661bis, despite the fact that this document obsoletes RFC 5661
will obsolete RFC 5661 [65]. [66].
o There is a need for a new approach to the description of * There is a need for a new approach to the description of
internationalization since the current internationalization internationalization since the current internationalization
section (Section 14) has never been implemented and does not meet section (Section 14) has never been implemented and does not meet
the needs of the NFSv4 protocol. Possible solutions are to create the needs of the NFSv4 protocol. Possible solutions are to create
a new internationalization section modeled on that in [67] or to a new internationalization section modeled on that in [68] or to
create a new document describing internationalization for all create a new document describing internationalization for all
NFSv4 minor versions and reference that document in the RFCs NFSv4 minor versions and reference that document in the RFCs
defining both NFSv4.0 and NFSv4.1. defining both NFSv4.0 and NFSv4.1.
o There is a need for a revised treatment of security in NFSv4.1. * There is a need for a revised treatment of security in NFSv4.1.
The issues with the existing treatment are discussed in The issues with the existing treatment are discussed in
Appendix C. Appendix C.
Until the above work is done, there will not be a consistent set of Until the above work is done, there will not be a consistent set of
documents providing a description of the NFSv4.1 protocol and any documents that provides a description of the NFSv4.1 protocol, and
full description would involve documents updating other documents any full description would involve documents updating other documents
within the specification. The updates applied by RFC8434 [69] and within the specification. The updates applied by RFC 8434 [70] and
RFC8178 [66] to RFC5661 also apply to this specification, and will RFC 8178 [67] to RFC 5661 also apply to this specification, and will
apply to any subsequent v4.1 specification until that work is done. apply to any subsequent v4.1 specification until that work is done.
1.2. The NFS Version 4 Minor Version 1 Protocol 1.2. The NFS Version 4 Minor Version 1 Protocol
The NFS version 4 minor version 1 (NFSv4.1) protocol is the second The NFS version 4 minor version 1 (NFSv4.1) protocol is the second
minor version of the NFS version 4 (NFSv4) protocol. The first minor minor version of the NFS version 4 (NFSv4) protocol. The first minor
version, NFSv4.0, is now described in RFC 7530 [67]. It generally version, NFSv4.0, is now described in RFC 7530 [68]. It generally
follows the guidelines for minor versioning that are listed in follows the guidelines for minor versioning that are listed in
Section 10 of RFC 3530. However, it diverges from guidelines 11 ("a Section 10 of RFC 3530 [37]. However, it diverges from guidelines 11
client and server that support minor version X must support minor ("a client and server that support minor version X must support minor
versions 0 through X-1") and 12 ("no new features may be introduced versions 0 through X-1") and 12 ("no new features may be introduced
as mandatory in a minor version"). These divergences are due to the as mandatory in a minor version"). These divergences are due to the
introduction of the sessions model for managing non-idempotent introduction of the sessions model for managing non-idempotent
operations and the RECLAIM_COMPLETE operation. These two new operations and the RECLAIM_COMPLETE operation. These two new
features are infrastructural in nature and simplify implementation of features are infrastructural in nature and simplify implementation of
existing and other new features. Making them anything but REQUIRED existing and other new features. Making them anything but REQUIRED
would add undue complexity to protocol definition and implementation. would add undue complexity to protocol definition and implementation.
NFSv4.1 accordingly updates the minor versioning guidelines NFSv4.1 accordingly updates the minor versioning guidelines
(Section 2.7). (Section 2.7).
skipping to change at page 10, line 25 skipping to change at line 448
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [1]. document are to be interpreted as described in RFC 2119 [1].
1.4. Scope of This Document 1.4. Scope of This Document
This document describes the NFSv4.1 protocol. With respect to This document describes the NFSv4.1 protocol. With respect to
NFSv4.0, this document does not: NFSv4.0, this document does not:
o describe the NFSv4.0 protocol, except where needed to contrast * describe the NFSv4.0 protocol, except where needed to contrast
with NFSv4.1. with NFSv4.1.
o modify the specification of the NFSv4.0 protocol. * modify the specification of the NFSv4.0 protocol.
o clarify the NFSv4.0 protocol. * clarify the NFSv4.0 protocol.
1.5. NFSv4 Goals 1.5. NFSv4 Goals
The NFSv4 protocol is a further revision of the NFS protocol defined The NFSv4 protocol is a further revision of the NFS protocol defined
already by NFSv3 [37]. It retains the essential characteristics of already by NFSv3 [38]. It retains the essential characteristics of
previous versions: easy recovery; independence of transport previous versions: easy recovery; independence of transport
protocols, operating systems, and file systems; simplicity; and good protocols, operating systems, and file systems; simplicity; and good
performance. NFSv4 has the following goals: performance. NFSv4 has the following goals:
o Improved access and good performance on the Internet * Improved access and good performance on the Internet
The protocol is designed to transit firewalls easily, perform well The protocol is designed to transit firewalls easily, perform well
where latency is high and bandwidth is low, and scale to very where latency is high and bandwidth is low, and scale to very
large numbers of clients per server. large numbers of clients per server.
o Strong security with negotiation built into the protocol * Strong security with negotiation built into the protocol
The protocol builds on the work of the ONCRPC working group in The protocol builds on the work of the ONCRPC working group in
supporting the RPCSEC_GSS protocol. Additionally, the NFSv4.1 supporting the RPCSEC_GSS protocol. Additionally, the NFSv4.1
protocol provides a mechanism to allow clients and servers the protocol provides a mechanism to allow clients and servers the
ability to negotiate security and require clients and servers to ability to negotiate security and require clients and servers to
support a minimal set of security schemes. support a minimal set of security schemes.
o Good cross-platform interoperability * Good cross-platform interoperability
The protocol features a file system model that provides a useful, The protocol features a file system model that provides a useful,
common set of features that does not unduly favor one file system common set of features that does not unduly favor one file system
or operating system over another. or operating system over another.
o Designed for protocol extensions * Designed for protocol extensions
The protocol is designed to accept standard extensions within a The protocol is designed to accept standard extensions within a
framework that enables and encourages backward compatibility. framework that enables and encourages backward compatibility.
1.6. NFSv4.1 Goals 1.6. NFSv4.1 Goals
NFSv4.1 has the following goals, within the framework established by NFSv4.1 has the following goals, within the framework established by
the overall NFSv4 goals. the overall NFSv4 goals.
o To correct significant structural weaknesses and oversights * To correct significant structural weaknesses and oversights
discovered in the base protocol. discovered in the base protocol.
o To add clarity and specificity to areas left unaddressed or not * To add clarity and specificity to areas left unaddressed or not
addressed in sufficient detail in the base protocol. However, as addressed in sufficient detail in the base protocol. However, as
stated in Section 1.4, it is not a goal to clarify the NFSv4.0 stated in Section 1.4, it is not a goal to clarify the NFSv4.0
protocol in the NFSv4.1 specification. protocol in the NFSv4.1 specification.
o To add specific features based on experience with the existing * To add specific features based on experience with the existing
protocol and recent industry developments. protocol and recent industry developments.
o To provide protocol support to take advantage of clustered server * To provide protocol support to take advantage of clustered server
deployments including the ability to provide scalable parallel deployments including the ability to provide scalable parallel
access to files distributed among multiple servers. access to files distributed among multiple servers.
1.7. General Definitions 1.7. General Definitions
The following definitions provide an appropriate context for the The following definitions provide an appropriate context for the
reader. reader.
Byte: In this document, a byte is an octet, i.e., a datum exactly 8 Byte: In this document, a byte is an octet, i.e., a datum exactly 8
bits in length. bits in length.
skipping to change at page 13, line 14 skipping to change at line 579
Server: The Server is the entity responsible for coordinating client Server: The Server is the entity responsible for coordinating client
access to a set of file systems and is identified by a server access to a set of file systems and is identified by a server
owner. A server can span multiple network addresses. owner. A server can span multiple network addresses.
Server Owner: The server owner identifies the server to the client. Server Owner: The server owner identifies the server to the client.
The server owner consists of a major identifier and a minor The server owner consists of a major identifier and a minor
identifier. When the client has two connections each to a peer identifier. When the client has two connections each to a peer
with the same major identifier, the client assumes that both peers with the same major identifier, the client assumes that both peers
are the same server (the server namespace is the same via each are the same server (the server namespace is the same via each
connection) and that lock state is sharable across both connection) and that lock state is shareable across both
connections. When each peer has both the same major and minor connections. When each peer has both the same major and minor
identifiers, the client assumes that each connection might be identifiers, the client assumes that each connection might be
associable with the same session. associable with the same session.
Stable Storage: Stable storage is storage from which data stored by Stable Storage: Stable storage is storage from which data stored by
an NFSv4.1 server can be recovered without data loss from multiple an NFSv4.1 server can be recovered without data loss from multiple
power failures (including cascading power failures, that is, power failures (including cascading power failures, that is,
several power failures in quick succession), operating system several power failures in quick succession), operating system
failures, and/or hardware failure of components other than the failures, and/or hardware failure of components other than the
storage medium itself (such as disk, nonvolatile RAM, flash storage medium itself (such as disk, nonvolatile RAM, flash
skipping to change at page 16, line 21 skipping to change at line 726
filehandles. filehandles.
1.8.3.2. File Attributes 1.8.3.2. File Attributes
The NFSv4.1 protocol has a rich and extensible file object attribute The NFSv4.1 protocol has a rich and extensible file object attribute
structure, which is divided into REQUIRED, RECOMMENDED, and named structure, which is divided into REQUIRED, RECOMMENDED, and named
attributes (see Section 5). attributes (see Section 5).
Several (but not all) of the REQUIRED attributes are derived from the Several (but not all) of the REQUIRED attributes are derived from the
attributes of NFSv3 (see the definition of the fattr3 data type in attributes of NFSv3 (see the definition of the fattr3 data type in
[37]). An example of a REQUIRED attribute is the file object's type [38]). An example of a REQUIRED attribute is the file object's type
(Section 5.8.1.2) so that regular files can be distinguished from (Section 5.8.1.2) so that regular files can be distinguished from
directories (also known as folders in some operating environments) directories (also known as folders in some operating environments)
and other types of objects. REQUIRED attributes are discussed in and other types of objects. REQUIRED attributes are discussed in
Section 5.1. Section 5.1.
An example of three RECOMMENDED attributes are acl, sacl, and dacl. An example of three RECOMMENDED attributes are acl, sacl, and dacl.
These attributes define an Access Control List (ACL) on a file object These attributes define an Access Control List (ACL) on a file object
(Section 6). An ACL provides directory and file access control (Section 6). An ACL provides directory and file access control
beyond the model used in NFSv3. The ACL definition allows for beyond the model used in NFSv3. The ACL definition allows for
specification of specific sets of permissions for individual users specification of specific sets of permissions for individual users
skipping to change at page 16, line 50 skipping to change at line 755
application-specific data with a regular file or directory. NFSv4.1 application-specific data with a regular file or directory. NFSv4.1
modifies named attributes relative to NFSv4.0 by tightening the modifies named attributes relative to NFSv4.0 by tightening the
allowed operations in order to prevent the development of non- allowed operations in order to prevent the development of non-
interoperable implementations. Named attributes are discussed in interoperable implementations. Named attributes are discussed in
Section 5.3. Section 5.3.
1.8.3.3. Multi-Server Namespace 1.8.3.3. Multi-Server Namespace
NFSv4.1 contains a number of features to allow implementation of NFSv4.1 contains a number of features to allow implementation of
namespaces that cross server boundaries and that allow and facilitate namespaces that cross server boundaries and that allow and facilitate
a non-disruptive transfer of support for individual file systems a nondisruptive transfer of support for individual file systems
between servers. They are all based upon attributes that allow one between servers. They are all based upon attributes that allow one
file system to specify alternate, additional, and new location file system to specify alternate, additional, and new location
information that specifies how the client may access that file information that specifies how the client may access that file
system. system.
These attributes can be used to provide for individual active file These attributes can be used to provide for individual active file
systems: systems:
o Alternate network addresses to access the current file system * Alternate network addresses to access the current file system
instance. instance.
o The locations of alternate file system instances or replicas to be * The locations of alternate file system instances or replicas to be
used in the event that the current file system instance becomes used in the event that the current file system instance becomes
unavailable. unavailable.
These file system location attributes may be used together with the These file system location attributes may be used together with the
concept of absent file systems, in which a position in the server concept of absent file systems, in which a position in the server
namespace is associated with locations on other servers without there namespace is associated with locations on other servers without there
being any corresponding file system instance on the current server. being any corresponding file system instance on the current server.
For example, For example,
o These attributes may be used with absent file systems to implement * These attributes may be used with absent file systems to implement
referrals whereby one server may direct the client to a file referrals whereby one server may direct the client to a file
system provided by another server. This allows extensive multi- system provided by another server. This allows extensive multi-
server namespaces to be constructed. server namespaces to be constructed.
o These attributes may be provided when a previously present file * These attributes may be provided when a previously present file
system becomes absent. This allows non-disruptive migration of system becomes absent. This allows nondisruptive migration of
file systems to alternate servers. file systems to alternate servers.
1.8.4. Locking Facilities 1.8.4. Locking Facilities
As mentioned previously, NFSv4.1 is a single protocol that includes As mentioned previously, NFSv4.1 is a single protocol that includes
locking facilities. These locking facilities include support for locking facilities. These locking facilities include support for
many types of locks including a number of sorts of recallable locks. many types of locks including a number of sorts of recallable locks.
Recallable locks such as delegations allow the client to be assured Recallable locks such as delegations allow the client to be assured
that certain events will not occur so long as that lock is held. that certain events will not occur so long as that lock is held.
When circumstances change, the lock is recalled via a callback When circumstances change, the lock is recalled via a callback
request. The assurances provided by delegations allow more extensive request. The assurances provided by delegations allow more extensive
caching to be done safely when circumstances allow it. caching to be done safely when circumstances allow it.
The types of locks are: The types of locks are:
o Share reservations as established by OPEN operations. * Share reservations as established by OPEN operations.
o Byte-range locks. * Byte-range locks.
o File delegations, which are recallable locks that assure the * File delegations, which are recallable locks that assure the
holder that inconsistent opens and file changes cannot occur so holder that inconsistent opens and file changes cannot occur so
long as the delegation is held. long as the delegation is held.
o Directory delegations, which are recallable locks that assure the * Directory delegations, which are recallable locks that assure the
holder that inconsistent directory modifications cannot occur so holder that inconsistent directory modifications cannot occur so
long as the delegation is held. long as the delegation is held.
o Layouts, which are recallable objects that assure the holder that * Layouts, which are recallable objects that assure the holder that
direct access to the file data may be performed directly by the direct access to the file data may be performed directly by the
client and that no change to the data's location that is client and that no change to the data's location that is
inconsistent with that access may be made so long as the layout is inconsistent with that access may be made so long as the layout is
held. held.
All locks for a given client are tied together under a single client- All locks for a given client are tied together under a single client-
wide lease. All requests made on sessions associated with the client wide lease. All requests made on sessions associated with the client
renew that lease. When the client's lease is not promptly renewed, renew that lease. When the client's lease is not promptly renewed,
the client's locks are subject to revocation. In the event of server the client's locks are subject to revocation. In the event of server
restart, clients have the opportunity to safely reclaim their locks restart, clients have the opportunity to safely reclaim their locks
within a special grace period. within a special grace period.
1.9. Differences from NFSv4.0 1.9. Differences from NFSv4.0
The following summarizes the major differences between minor version The following summarizes the major differences between minor version
1 and the base protocol: 1 and the base protocol:
o Implementation of the sessions model (Section 2.10). * Implementation of the sessions model (Section 2.10).
o Parallel access to data (Section 12). * Parallel access to data (Section 12).
o Addition of the RECLAIM_COMPLETE operation to better structure the * Addition of the RECLAIM_COMPLETE operation to better structure the
lock reclamation process (Section 18.51). lock reclamation process (Section 18.51).
o Enhanced delegation support as follows. * Enhanced delegation support as follows.
* Delegations on directories and other file types in addition to - Delegations on directories and other file types in addition to
regular files (Section 18.39, Section 18.49). regular files (Section 18.39, Section 18.49).
* Operations to optimize acquisition of recalled or denied - Operations to optimize acquisition of recalled or denied
delegations (Section 18.49, Section 20.5, Section 20.7). delegations (Section 18.49, Section 20.5, Section 20.7).
* Notifications of changes to files and directories - Notifications of changes to files and directories
(Section 18.39, Section 20.4). (Section 18.39, Section 20.4).
* A method to allow a server to indicate that it is recalling one - A method to allow a server to indicate that it is recalling one
or more delegations for resource management reasons, and thus a or more delegations for resource management reasons, and thus a
method to allow the client to pick which delegations to return method to allow the client to pick which delegations to return
(Section 20.6). (Section 20.6).
o Attributes can be set atomically during exclusive file create via * Attributes can be set atomically during exclusive file create via
the OPEN operation (see the new EXCLUSIVE4_1 creation method in the OPEN operation (see the new EXCLUSIVE4_1 creation method in
Section 18.16). Section 18.16).
o Open files can be preserved if removed and the hard link count * Open files can be preserved if removed and the hard link count
("hard link" is defined in an Open Group [6] standard) goes to ("hard link" is defined in an Open Group [6] standard) goes to
zero, thus obviating the need for clients to rename deleted files zero, thus obviating the need for clients to rename deleted files
to partially hidden names -- colloquially called "silly rename" to partially hidden names -- colloquially called "silly rename"
(see the new OPEN4_RESULT_PRESERVE_UNLINKED reply flag in (see the new OPEN4_RESULT_PRESERVE_UNLINKED reply flag in
Section 18.16). Section 18.16).
o Improved compatibility with Microsoft Windows for Access Control * Improved compatibility with Microsoft Windows for Access Control
Lists (Section 6.2.3, Section 6.2.2, Section 6.4.3.2). Lists (Section 6.2.3, Section 6.2.2, Section 6.4.3.2).
o Data retention (Section 5.13). * Data retention (Section 5.13).
o Identification of the implementation of the NFS client and server * Identification of the implementation of the NFS client and server
(Section 18.35). (Section 18.35).
o Support for notification of the availability of byte-range locks * Support for notification of the availability of byte-range locks
(see the new OPEN4_RESULT_MAY_NOTIFY_LOCK reply flag in (see the new OPEN4_RESULT_MAY_NOTIFY_LOCK reply flag in
Section 18.16 and see Section 20.11). Section 18.16 and see Section 20.11).
o In NFSv4.1, LIPKEY and SPKM-3 are not required security mechanisms * In NFSv4.1, LIPKEY and SPKM-3 are not required security mechanisms
[38]. [39].
2. Core Infrastructure 2. Core Infrastructure
2.1. Introduction 2.1. Introduction
NFSv4.1 relies on core infrastructure common to nearly every NFSv4.1 relies on core infrastructure common to nearly every
operation. This core infrastructure is described in the remainder of operation. This core infrastructure is described in the remainder of
this section. this section.
2.2. RPC and XDR 2.2. RPC and XDR
skipping to change at page 20, line 12 skipping to change at line 908
forms of RPC authentication, AUTH_SYS, had no strong authentication forms of RPC authentication, AUTH_SYS, had no strong authentication
and required a host-based authentication approach. NFSv4.1 also and required a host-based authentication approach. NFSv4.1 also
depends on RPC for basic security services and mandates RPC support depends on RPC for basic security services and mandates RPC support
for a user-based authentication model. The user-based authentication for a user-based authentication model. The user-based authentication
model has user principals authenticated by a server, and in turn the model has user principals authenticated by a server, and in turn the
server authenticated by user principals. RPC provides some basic server authenticated by user principals. RPC provides some basic
security services that are used by NFSv4.1. security services that are used by NFSv4.1.
2.2.1.1. RPC Security Flavors 2.2.1.1. RPC Security Flavors
As described in Section 7.2 ("Authentication") of [3], RPC security As described in "Authentication", Section 7 of [3], RPC security is
is encapsulated in the RPC header, via a security or authentication encapsulated in the RPC header, via a security or authentication
flavor, and information specific to the specified security flavor. flavor, and information specific to the specified security flavor.
Every RPC header conveys information used to identify and Every RPC header conveys information used to identify and
authenticate a client and server. As discussed in Section 2.2.1.1.1, authenticate a client and server. As discussed in Section 2.2.1.1.1,
some security flavors provide additional security services. some security flavors provide additional security services.
NFSv4.1 clients and servers MUST implement RPCSEC_GSS. (This NFSv4.1 clients and servers MUST implement RPCSEC_GSS. (This
requirement to implement is not a requirement to use.) Other requirement to implement is not a requirement to use.) Other
flavors, such as AUTH_NONE and AUTH_SYS, MAY be implemented as well. flavors, such as AUTH_NONE and AUTH_SYS, MAY be implemented as well.
2.2.1.1.1. RPCSEC_GSS and Security Services 2.2.1.1.1. RPCSEC_GSS and Security Services
skipping to change at page 21, line 52 skipping to change at line 995
------------------------------------------------------------------ ------------------------------------------------------------------
390003 krb5 1.2.840.113554.1.2.2 rpc_gss_svc_none yes yes 390003 krb5 1.2.840.113554.1.2.2 rpc_gss_svc_none yes yes
390004 krb5i 1.2.840.113554.1.2.2 rpc_gss_svc_integrity yes yes 390004 krb5i 1.2.840.113554.1.2.2 rpc_gss_svc_integrity yes yes
390005 krb5p 1.2.840.113554.1.2.2 rpc_gss_svc_privacy no yes 390005 krb5p 1.2.840.113554.1.2.2 rpc_gss_svc_privacy no yes
Note that the number and name of the pseudo flavor are presented here Note that the number and name of the pseudo flavor are presented here
as a mapping aid to the implementor. Because the NFSv4.1 protocol as a mapping aid to the implementor. Because the NFSv4.1 protocol
includes a method to negotiate security and it understands the GSS- includes a method to negotiate security and it understands the GSS-
API mechanism, the pseudo flavor is not needed. The pseudo flavor is API mechanism, the pseudo flavor is not needed. The pseudo flavor is
needed for the NFSv3 since the security negotiation is done via the needed for the NFSv3 since the security negotiation is done via the
MOUNT protocol as described in [39]. MOUNT protocol as described in [40].
At the time NFSv4.1 was specified, the Advanced Encryption Standard At the time NFSv4.1 was specified, the Advanced Encryption Standard
(AES) with HMAC-SHA1 was a REQUIRED algorithm set for Kerberos V5. (AES) with HMAC-SHA1 was a REQUIRED algorithm set for Kerberos V5.
In contrast, when NFSv4.0 was specified, weaker algorithm sets were In contrast, when NFSv4.0 was specified, weaker algorithm sets were
REQUIRED for Kerberos V5, and were REQUIRED in the NFSv4.0 REQUIRED for Kerberos V5, and were REQUIRED in the NFSv4.0
specification, because the Kerberos V5 specification at the time did specification, because the Kerberos V5 specification at the time did
not specify stronger algorithms. The NFSv4.1 specification does not not specify stronger algorithms. The NFSv4.1 specification does not
specify REQUIRED algorithms for Kerberos V5, and instead, the specify REQUIRED algorithms for Kerberos V5, and instead, the
implementor is expected to track the evolution of the Kerberos V5 implementor is expected to track the evolution of the Kerberos V5
standard if and when stronger algorithms are specified. standard if and when stronger algorithms are specified.
skipping to change at page 24, line 38 skipping to change at line 1126
Client identification is encapsulated in the following client owner Client identification is encapsulated in the following client owner
data type: data type:
struct client_owner4 { struct client_owner4 {
verifier4 co_verifier; verifier4 co_verifier;
opaque co_ownerid<NFS4_OPAQUE_LIMIT>; opaque co_ownerid<NFS4_OPAQUE_LIMIT>;
}; };
The first field, co_verifier, is a client incarnation verifier, The first field, co_verifier, is a client incarnation verifier,
allowing the server to distinguish successive incarnations (e.g. allowing the server to distinguish successive incarnations (e.g.,
reboots) of the same client. The server will start the process of reboots) of the same client. The server will start the process of
canceling the client's leased state if co_verifier is different than canceling the client's leased state if co_verifier is different than
what the server has previously recorded for the identified client (as what the server has previously recorded for the identified client (as
specified in the co_ownerid field). specified in the co_ownerid field).
The second field, co_ownerid, is a variable length string that The second field, co_ownerid, is a variable length string that
uniquely defines the client so that subsequent instances of the same uniquely defines the client so that subsequent instances of the same
client bear the same co_ownerid with a different verifier. client bear the same co_ownerid with a different verifier.
There are several considerations for how the client generates the There are several considerations for how the client generates the
co_ownerid string: co_ownerid string:
o The string should be unique so that multiple clients do not * The string should be unique so that multiple clients do not
present the same string. The consequences of two clients present the same string. The consequences of two clients
presenting the same string range from one client getting an error presenting the same string range from one client getting an error
to one client having its leased state abruptly and unexpectedly to one client having its leased state abruptly and unexpectedly
cancelled. cancelled.
o The string should be selected so that subsequent incarnations * The string should be selected so that subsequent incarnations
(e.g., restarts) of the same client cause the client to present (e.g., restarts) of the same client cause the client to present
the same string. The implementor is cautioned from an approach the same string. The implementor is cautioned from an approach
that requires the string to be recorded in a local file because that requires the string to be recorded in a local file because
this precludes the use of the implementation in an environment this precludes the use of the implementation in an environment
where there is no local disk and all file access is from an where there is no local disk and all file access is from an
NFSv4.1 server. NFSv4.1 server.
o The string should be the same for each server network address that * The string should be the same for each server network address that
the client accesses. This way, if a server has multiple the client accesses. This way, if a server has multiple
interfaces, the client can trunk traffic over multiple network interfaces, the client can trunk traffic over multiple network
paths as described in Section 2.10.5. (Note: the precise opposite paths as described in Section 2.10.5. (Note: the precise opposite
was advised in the NFSv4.0 specification [36].) was advised in the NFSv4.0 specification [37].)
o The algorithm for generating the string should not assume that the * The algorithm for generating the string should not assume that the
client's network address will not change, unless the client client's network address will not change, unless the client
implementation knows it is using statically assigned network implementation knows it is using statically assigned network
addresses. This includes changes between client incarnations and addresses. This includes changes between client incarnations and
even changes while the client is still running in its current even changes while the client is still running in its current
incarnation. Thus, with dynamic address assignment, if the client incarnation. Thus, with dynamic address assignment, if the client
includes just the client's network address in the co_ownerid includes just the client's network address in the co_ownerid
string, there is a real risk that after the client gives up the string, there is a real risk that after the client gives up the
network address, another client, using a similar algorithm for network address, another client, using a similar algorithm for
generating the co_ownerid string, would generate a conflicting generating the co_ownerid string, would generate a conflicting
co_ownerid string. co_ownerid string.
Given the above considerations, an example of a well-generated Given the above considerations, an example of a well-generated
co_ownerid string is one that includes: co_ownerid string is one that includes:
o If applicable, the client's statically assigned network address. * If applicable, the client's statically assigned network address.
o Additional information that tends to be unique, such as one or * Additional information that tends to be unique, such as one or
more of: more of:
* The client machine's serial number (for privacy reasons, it is - The client machine's serial number (for privacy reasons, it is
best to perform some one-way function on the serial number). best to perform some one-way function on the serial number).
* A Media Access Control (MAC) address (again, a one-way function - A Media Access Control (MAC) address (again, a one-way function
should be performed). should be performed).
* The timestamp of when the NFSv4.1 software was first installed - The timestamp of when the NFSv4.1 software was first installed
on the client (though this is subject to the previously on the client (though this is subject to the previously
mentioned caution about using information that is stored in a mentioned caution about using information that is stored in a
file, because the file might only be accessible over NFSv4.1). file, because the file might only be accessible over NFSv4.1).
* A true random number. However, since this number ought to be - A true random number. However, since this number ought to be
the same between client incarnations, this shares the same the same between client incarnations, this shares the same
problem as that of using the timestamp of the software problem as that of using the timestamp of the software
installation. installation.
o For a user-level NFSv4.1 client, it should contain additional * For a user-level NFSv4.1 client, it should contain additional
information to distinguish the client from other user-level information to distinguish the client from other user-level
clients running on the same host, such as a process identifier or clients running on the same host, such as a process identifier or
other unique sequence. other unique sequence.
The client ID is assigned by the server (the eir_clientid result from The client ID is assigned by the server (the eir_clientid result from
EXCHANGE_ID) and should be chosen so that it will not conflict with a EXCHANGE_ID) and should be chosen so that it will not conflict with a
client ID previously assigned by the server. This applies across client ID previously assigned by the server. This applies across
server restarts. server restarts.
In the event of a server restart, a client may find out that its In the event of a server restart, a client may find out that its
skipping to change at page 27, line 26 skipping to change at line 1258
To facilitate upgrade from NFSv4.0 to NFSv4.1, a server may compare a To facilitate upgrade from NFSv4.0 to NFSv4.1, a server may compare a
value of data type client_owner4 in an EXCHANGE_ID with a value of value of data type client_owner4 in an EXCHANGE_ID with a value of
data type nfs_client_id4 that was established using the SETCLIENTID data type nfs_client_id4 that was established using the SETCLIENTID
operation of NFSv4.0. A server that does so will allow an upgraded operation of NFSv4.0. A server that does so will allow an upgraded
client to avoid waiting until the lease (i.e., the lease established client to avoid waiting until the lease (i.e., the lease established
by the NFSv4.0 instance client) expires. This requires that the by the NFSv4.0 instance client) expires. This requires that the
value of data type client_owner4 be constructed the same way as the value of data type client_owner4 be constructed the same way as the
value of data type nfs_client_id4. If the latter's contents included value of data type nfs_client_id4. If the latter's contents included
the server's network address (per the recommendations of the NFSv4.0 the server's network address (per the recommendations of the NFSv4.0
specification [36]), and the NFSv4.1 client does not wish to use a specification [37]), and the NFSv4.1 client does not wish to use a
client ID that prevents trunking, it should send two EXCHANGE_ID client ID that prevents trunking, it should send two EXCHANGE_ID
operations. The first EXCHANGE_ID will have a client_owner4 equal to operations. The first EXCHANGE_ID will have a client_owner4 equal to
the nfs_client_id4. This will clear the state created by the NFSv4.0 the nfs_client_id4. This will clear the state created by the NFSv4.0
client. The second EXCHANGE_ID will not have the server's network client. The second EXCHANGE_ID will not have the server's network
address. The state created for the second EXCHANGE_ID will not have address. The state created for the second EXCHANGE_ID will not have
to wait for lease expiration, because there will be no state to to wait for lease expiration, because there will be no state to
expire. expire.
2.4.2. Server Release of Client ID 2.4.2. Server Release of Client ID
skipping to change at page 28, line 24 skipping to change at line 1304
has no state, or that has state but the lease has expired, the server has no state, or that has state but the lease has expired, the server
MUST allow the EXCHANGE_ID and confirm the new client ID if followed MUST allow the EXCHANGE_ID and confirm the new client ID if followed
by the appropriate CREATE_SESSION. by the appropriate CREATE_SESSION.
When the server gets an EXCHANGE_ID for a new incarnation of a client When the server gets an EXCHANGE_ID for a new incarnation of a client
owner that currently has an old incarnation with state and an owner that currently has an old incarnation with state and an
unexpired lease, the server is allowed to dispose of the state of the unexpired lease, the server is allowed to dispose of the state of the
previous incarnation of the client owner if one of the following is previous incarnation of the client owner if one of the following is
true: true:
o The principal that created the client ID for the client owner is * The principal that created the client ID for the client owner is
the same as the principal that is sending the EXCHANGE_ID the same as the principal that is sending the EXCHANGE_ID
operation. Note that if the client ID was created with operation. Note that if the client ID was created with
SP4_MACH_CRED state protection (Section 18.35), the principal MUST SP4_MACH_CRED state protection (Section 18.35), the principal MUST
be based on RPCSEC_GSS authentication, the RPCSEC_GSS service used be based on RPCSEC_GSS authentication, the RPCSEC_GSS service used
MUST be integrity or privacy, and the same GSS mechanism and MUST be integrity or privacy, and the same GSS mechanism and
principal MUST be used as that used when the client ID was principal MUST be used as that used when the client ID was
created. created.
o The client ID was established with SP4_SSV protection * The client ID was established with SP4_SSV protection
(Section 18.35, Section 2.10.8.3) and the client sends the (Section 18.35, Section 2.10.8.3) and the client sends the
EXCHANGE_ID with the security flavor set to RPCSEC_GSS using the EXCHANGE_ID with the security flavor set to RPCSEC_GSS using the
GSS SSV mechanism (Section 2.10.9). GSS SSV mechanism (Section 2.10.9).
o The client ID was established with SP4_SSV protection, and under * The client ID was established with SP4_SSV protection, and under
the conditions described herein, the EXCHANGE_ID was sent with the conditions described herein, the EXCHANGE_ID was sent with
SP4_MACH_CRED state protection. Because the SSV might not persist SP4_MACH_CRED state protection. Because the SSV might not persist
across client and server restart, and because the first time a across client and server restart, and because the first time a
client sends EXCHANGE_ID to a server it does not have an SSV, the client sends EXCHANGE_ID to a server it does not have an SSV, the
client MAY send the subsequent EXCHANGE_ID without an SSV client MAY send the subsequent EXCHANGE_ID without an SSV
RPCSEC_GSS handle. Instead, as with SP4_MACH_CRED protection, the RPCSEC_GSS handle. Instead, as with SP4_MACH_CRED protection, the
principal MUST be based on RPCSEC_GSS authentication, the principal MUST be based on RPCSEC_GSS authentication, the
RPCSEC_GSS service used MUST be integrity or privacy, and the same RPCSEC_GSS service used MUST be integrity or privacy, and the same
GSS mechanism and principal MUST be used as that used when the GSS mechanism and principal MUST be used as that used when the
client ID was created. client ID was created.
skipping to change at page 34, line 51 skipping to change at line 1617
but not bFH's policy. The server returns NFS4ERR_WRONGSEC on the but not bFH's policy. The server returns NFS4ERR_WRONGSEC on the
RENAME operation. RENAME operation.
To prevent a client from an endless sequence of a request containing To prevent a client from an endless sequence of a request containing
LINK or RENAME, followed by a request containing SECINFO_NO_NAME or LINK or RENAME, followed by a request containing SECINFO_NO_NAME or
SECINFO, the server MUST detect when the security policies of the SECINFO, the server MUST detect when the security policies of the
current and saved filehandles have no mutually acceptable security current and saved filehandles have no mutually acceptable security
tuple, and MUST NOT return NFS4ERR_WRONGSEC from LINK or RENAME in tuple, and MUST NOT return NFS4ERR_WRONGSEC from LINK or RENAME in
that situation. Instead the server MUST do one of two things: that situation. Instead the server MUST do one of two things:
o The server can return NFS4ERR_XDEV. * The server can return NFS4ERR_XDEV.
o The server can allow the security policy of the current filehandle * The server can allow the security policy of the current filehandle
to override that of the saved filehandle, and so return NFS4_OK. to override that of the saved filehandle, and so return NFS4_OK.
2.7. Minor Versioning 2.7. Minor Versioning
To address the requirement of an NFS protocol that can evolve as the To address the requirement of an NFS protocol that can evolve as the
need arises, the NFSv4.1 protocol contains the rules and framework to need arises, the NFSv4.1 protocol contains the rules and framework to
allow for future minor changes or versioning. allow for future minor changes or versioning.
The base assumption with respect to minor versioning is that any The base assumption with respect to minor versioning is that any
future accepted minor version will be documented in one or more future accepted minor version will be documented in one or more
Standards Track RFCs. Minor version 0 of the NFSv4 protocol is Standards Track RFCs. Minor version 0 of the NFSv4 protocol is
represented by [36], and minor version 1 is represented by this RFC. represented by [37], and minor version 1 is represented by this RFC.
The COMPOUND and CB_COMPOUND procedures support the encoding of the The COMPOUND and CB_COMPOUND procedures support the encoding of the
minor version being requested by the client. minor version being requested by the client.
The following items represent the basic rules for the development of The following items represent the basic rules for the development of
minor versions. Note that a future minor version may modify or add minor versions. Note that a future minor version may modify or add
to the following rules as part of the minor version definition. to the following rules as part of the minor version definition.
1. Procedures are not added or deleted. 1. Procedures are not added or deleted.
To maintain the general RPC model, NFSv4 minor versions will not To maintain the general RPC model, NFSv4 minor versions will not
skipping to change at page 38, line 22 skipping to change at line 1782
this specification to specify heuristics for detecting intrusion via this specification to specify heuristics for detecting intrusion via
alarms. alarms.
2.9. Transport Layers 2.9. Transport Layers
2.9.1. REQUIRED and RECOMMENDED Properties of Transports 2.9.1. REQUIRED and RECOMMENDED Properties of Transports
NFSv4.1 works over Remote Direct Memory Access (RDMA) and non-RDMA- NFSv4.1 works over Remote Direct Memory Access (RDMA) and non-RDMA-
based transports with the following attributes: based transports with the following attributes:
o The transport supports reliable delivery of data, which NFSv4.1 * The transport supports reliable delivery of data, which NFSv4.1
requires but neither NFSv4.1 nor RPC has facilities for ensuring requires but neither NFSv4.1 nor RPC has facilities for ensuring
[40]. [41].
o The transport delivers data in the order it was sent. Ordered * The transport delivers data in the order it was sent. Ordered
delivery simplifies detection of transmit errors, and simplifies delivery simplifies detection of transmit errors, and simplifies
the sending of arbitrary sized requests and responses via the the sending of arbitrary sized requests and responses via the
record marking protocol [3]. record marking protocol [3].
Where an NFSv4.1 implementation supports operation over the IP Where an NFSv4.1 implementation supports operation over the IP
network protocol, any transport used between NFS and IP MUST be among network protocol, any transport used between NFS and IP MUST be among
the IETF-approved congestion control transport protocols. At the the IETF-approved congestion control transport protocols. At the
time this document was written, the only two transports that had the time this document was written, the only two transports that had the
above attributes were TCP and the Stream Control Transmission above attributes were TCP and the Stream Control Transmission
Protocol (SCTP). To enhance the possibilities for interoperability, Protocol (SCTP). To enhance the possibilities for interoperability,
skipping to change at page 39, line 24 skipping to change at line 1831
2. This will improve performance for the WAN environment by 2. This will improve performance for the WAN environment by
eliminating the need for connection setup handshakes. eliminating the need for connection setup handshakes.
3. The NFSv4.1 callback model differs from NFSv4.0, and requires the 3. The NFSv4.1 callback model differs from NFSv4.0, and requires the
client and server to maintain a client-created backchannel (see client and server to maintain a client-created backchannel (see
Section 2.10.3.1) for the server to use. Section 2.10.3.1) for the server to use.
In order to reduce congestion, if a connection-oriented transport is In order to reduce congestion, if a connection-oriented transport is
used, and the request is not the NULL procedure: used, and the request is not the NULL procedure:
o A requester MUST NOT retry a request unless the connection the * A requester MUST NOT retry a request unless the connection the
request was sent over was lost before the reply was received. request was sent over was lost before the reply was received.
o A replier MUST NOT silently drop a request, even if the request is * A replier MUST NOT silently drop a request, even if the request is
a retry. (The silent drop behavior of RPCSEC_GSS [4] does not a retry. (The silent drop behavior of RPCSEC_GSS [4] does not
apply because this behavior happens at the RPCSEC_GSS layer, a apply because this behavior happens at the RPCSEC_GSS layer, a
lower layer in the request processing.) Instead, the replier lower layer in the request processing.) Instead, the replier
SHOULD return an appropriate error (see Section 2.10.6.1), or it SHOULD return an appropriate error (see Section 2.10.6.1), or it
MAY disconnect the connection. MAY disconnect the connection.
When sending a reply, the replier MUST send the reply to the same When sending a reply, the replier MUST send the reply to the same
full network address (e.g., if using an IP-based transport, the full network address (e.g., if using an IP-based transport, the
source port of the requester is part of the full network address) source port of the requester is part of the full network address)
from which the requester sent the request. If using a connection- from which the requester sent the request. If using a connection-
skipping to change at page 40, line 5 skipping to change at line 1860
reply. If a connection is established with the same source and reply. If a connection is established with the same source and
destination full network address as the dropped connection, then the destination full network address as the dropped connection, then the
replier MUST NOT send the reply until the requester retries the replier MUST NOT send the reply until the requester retries the
request. The reason for this prohibition is that the requester MAY request. The reason for this prohibition is that the requester MAY
retry a request over a different connection (provided that connection retry a request over a different connection (provided that connection
is associated with the original request's session). is associated with the original request's session).
When using RDMA transports, there are other reasons for not When using RDMA transports, there are other reasons for not
tolerating retries over the same connection: tolerating retries over the same connection:
o RDMA transports use "credits" to enforce flow control, where a * RDMA transports use "credits" to enforce flow control, where a
credit is a right to a peer to transmit a message. If one peer credit is a right to a peer to transmit a message. If one peer
were to retransmit a request (or reply), it would consume an were to retransmit a request (or reply), it would consume an
additional credit. If the replier retransmitted a reply, it would additional credit. If the replier retransmitted a reply, it would
certainly result in an RDMA connection loss, since the requester certainly result in an RDMA connection loss, since the requester
would typically only post a single receive buffer for each would typically only post a single receive buffer for each
request. If the requester retransmitted a request, the additional request. If the requester retransmitted a request, the additional
credit consumed on the server might lead to RDMA connection credit consumed on the server might lead to RDMA connection
failure unless the client accounted for it and decreased its failure unless the client accounted for it and decreased its
available credit, leading to wasted resources. available credit, leading to wasted resources.
o RDMA credits present a new issue to the reply cache in NFSv4.1. * RDMA credits present a new issue to the reply cache in NFSv4.1.
The reply cache may be used when a connection within a session is The reply cache may be used when a connection within a session is
lost, such as after the client reconnects. Credit information is lost, such as after the client reconnects. Credit information is
a dynamic property of the RDMA connection, and stale values must a dynamic property of the RDMA connection, and stale values must
not be replayed from the cache. This implies that the reply cache not be replayed from the cache. This implies that the reply cache
contents must not be blindly used when replies are sent from it, contents must not be blindly used when replies are sent from it,
and credit information appropriate to the channel must be and credit information appropriate to the channel must be
refreshed by the RPC layer. refreshed by the RPC layer.
In addition, as described in Section 2.10.6.2, while a session is In addition, as described in Section 2.10.6.2, while a session is
active, the NFSv4.1 requester MUST NOT stop waiting for a reply. active, the NFSv4.1 requester MUST NOT stop waiting for a reply.
2.9.3. Ports 2.9.3. Ports
Historically, NFSv3 servers have listened over TCP port 2049. The Historically, NFSv3 servers have listened over TCP port 2049. The
registered port 2049 [41] for the NFS protocol should be the default registered port 2049 [42] for the NFS protocol should be the default
configuration. NFSv4.1 clients SHOULD NOT use the RPC binding configuration. NFSv4.1 clients SHOULD NOT use the RPC binding
protocols as described in [42]. protocols as described in [43].
2.10. Session 2.10. Session
NFSv4.1 clients and servers MUST support and MUST use the session NFSv4.1 clients and servers MUST support and MUST use the session
feature as described in this section. feature as described in this section.
2.10.1. Motivation and Overview 2.10.1. Motivation and Overview
Previous versions and minor versions of NFS have suffered from the Previous versions and minor versions of NFS have suffered from the
following: following:
o Lack of support for Exactly Once Semantics (EOS). This includes * Lack of support for Exactly Once Semantics (EOS). This includes
lack of support for EOS through server failure and recovery. lack of support for EOS through server failure and recovery.
o Limited callback support, including no support for sending * Limited callback support, including no support for sending
callbacks through firewalls, and races between replies to normal callbacks through firewalls, and races between replies to normal
requests and callbacks. requests and callbacks.
o Limited trunking over multiple network paths. * Limited trunking over multiple network paths.
o Requiring machine credentials for fully secure operation. * Requiring machine credentials for fully secure operation.
Through the introduction of a session, NFSv4.1 addresses the above Through the introduction of a session, NFSv4.1 addresses the above
shortfalls with practical solutions: shortfalls with practical solutions:
o EOS is enabled by a reply cache with a bounded size, making it * EOS is enabled by a reply cache with a bounded size, making it
feasible to keep the cache in persistent storage and enable EOS feasible to keep the cache in persistent storage and enable EOS
through server failure and recovery. One reason that previous through server failure and recovery. One reason that previous
revisions of NFS did not support EOS was because some EOS revisions of NFS did not support EOS was because some EOS
approaches often limited parallelism. As will be explained in approaches often limited parallelism. As will be explained in
Section 2.10.6, NFSv4.1 supports both EOS and unlimited Section 2.10.6, NFSv4.1 supports both EOS and unlimited
parallelism. parallelism.
o The NFSv4.1 client (defined in Section 1.7, Paragraph 2) creates * The NFSv4.1 client (defined in Section 1.7) creates transport
transport connections and provides them to the server to use for connections and provides them to the server to use for sending
sending callback requests, thus solving the firewall issue callback requests, thus solving the firewall issue
(Section 18.34). Races between responses from client requests and (Section 18.34). Races between responses from client requests and
callbacks caused by the requests are detected via the session's callbacks caused by the requests are detected via the session's
sequencing properties that are a consequence of EOS sequencing properties that are a consequence of EOS
(Section 2.10.6.3). (Section 2.10.6.3).
o The NFSv4.1 client can associate an arbitrary number of * The NFSv4.1 client can associate an arbitrary number of
connections with the session, and thus provide trunking connections with the session, and thus provide trunking
(Section 2.10.5). (Section 2.10.5).
o The NFSv4.1 client and server produce a session key independent of * The NFSv4.1 client and server produce a session key independent of
client and server machine credentials which can be used to compute client and server machine credentials which can be used to compute
a digest for protecting critical session management operations a digest for protecting critical session management operations
(Section 2.10.8.3). (Section 2.10.8.3).
o The NFSv4.1 client can also create secure RPCSEC_GSS contexts for * The NFSv4.1 client can also create secure RPCSEC_GSS contexts for
use by the session's backchannel that do not require the server to use by the session's backchannel that do not require the server to
authenticate to a client machine principal (Section 2.10.8.2). authenticate to a client machine principal (Section 2.10.8.2).
A session is a dynamically created, long-lived server object created A session is a dynamically created, long-lived server object created
by a client and used over time from one or more transport by a client and used over time from one or more transport
connections. Its function is to maintain the server's state relative connections. Its function is to maintain the server's state relative
to the connection(s) belonging to a client instance. This state is to the connection(s) belonging to a client instance. This state is
entirely independent of the connection itself, and indeed the state entirely independent of the connection itself, and indeed the state
exists whether or not the connection exists. A client may have one exists whether or not the connection exists. A client may have one
or more sessions associated with it so that client-associated state or more sessions associated with it so that client-associated state
skipping to change at page 44, line 8 skipping to change at line 2055
The backchannel is used for callback requests from server to client, The backchannel is used for callback requests from server to client,
and carries CB_COMPOUND requests and responses. Whether or not there and carries CB_COMPOUND requests and responses. Whether or not there
is a backchannel is decided by the client; however, many features of is a backchannel is decided by the client; however, many features of
NFSv4.1 require a backchannel. NFSv4.1 servers MUST support NFSv4.1 require a backchannel. NFSv4.1 servers MUST support
backchannels. backchannels.
Each session has resources for each channel, including separate reply Each session has resources for each channel, including separate reply
caches (see Section 2.10.6.1). Note that even the backchannel caches (see Section 2.10.6.1). Note that even the backchannel
requires a reply cache (or, at least, a slot table in order to detect requires a reply cache (or, at least, a slot table in order to detect
retries) because some callback operations are nonidempotent. retries) because some callback operations are non-idempotent.
2.10.3.1. Association of Connections, Channels, and Sessions 2.10.3.1. Association of Connections, Channels, and Sessions
Each channel is associated with zero or more transport connections Each channel is associated with zero or more transport connections
(whether of the same transport protocol or different transport (whether of the same transport protocol or different transport
protocols). A connection can be associated with one channel or both protocols). A connection can be associated with one channel or both
channels of a session; the client and server negotiate whether a channels of a session; the client and server negotiate whether a
connection will carry traffic for one channel or both channels via connection will carry traffic for one channel or both channels via
the CREATE_SESSION (Section 18.36) and the BIND_CONN_TO_SESSION the CREATE_SESSION (Section 18.36) and the BIND_CONN_TO_SESSION
(Section 18.34) operations. When a session is created via (Section 18.34) operations. When a session is created via
skipping to change at page 45, line 21 skipping to change at line 2117
The use of such compatible values does not imply that a value The use of such compatible values does not imply that a value
generated by one server will always be accepted by another. In most generated by one server will always be accepted by another. In most
cases, it will not. However, a server will not inadvertently accept cases, it will not. However, a server will not inadvertently accept
a value generated by another server. When it does accept it, it will a value generated by another server. When it does accept it, it will
be because it is recognized as valid and carrying the same meaning as be because it is recognized as valid and carrying the same meaning as
on another server of the same scope. on another server of the same scope.
When servers are of the same server scope, this compatibility of When servers are of the same server scope, this compatibility of
values applies to the following identifiers: values applies to the following identifiers:
o Filehandle values. A filehandle value accepted by two servers of * Filehandle values. A filehandle value accepted by two servers of
the same server scope denotes the same object. A WRITE operation the same server scope denotes the same object. A WRITE operation
sent to one server is reflected immediately in a READ sent to the sent to one server is reflected immediately in a READ sent to the
other. other.
o Server owner values. When the server scope values are the same, * Server owner values. When the server scope values are the same,
server owner value may be validly compared. In cases where the server owner value may be validly compared. In cases where the
server scope values are different, server owner values are treated server scope values are different, server owner values are treated
as different even if they contain identical strings of bytes. as different even if they contain identical strings of bytes.
The coordination among servers required to provide such compatibility The coordination among servers required to provide such compatibility
can be quite minimal, and limited to a simple partition of the ID can be quite minimal, and limited to a simple partition of the ID
space. The recognition of common values requires additional space. The recognition of common values requires additional
implementation, but this can be tailored to the specific situations implementation, but this can be tailored to the specific situations
in which that recognition is desired. in which that recognition is desired.
Clients will have occasion to compare the server scope values of Clients will have occasion to compare the server scope values of
multiple servers under a number of circumstances, each of which will multiple servers under a number of circumstances, each of which will
be discussed under the appropriate functional section: be discussed under the appropriate functional section:
o When server owner values received in response to EXCHANGE_ID * When server owner values received in response to EXCHANGE_ID
operations sent to multiple network addresses are compared for the operations sent to multiple network addresses are compared for the
purpose of determining the validity of various forms of trunking, purpose of determining the validity of various forms of trunking,
as described in Section 11.5.2. . as described in Section 11.5.2.
o When network or server reconfiguration causes the same network * When network or server reconfiguration causes the same network
address to possibly be directed to different servers, with the address to possibly be directed to different servers, with the
necessity for the client to determine when lock reclaim should be necessity for the client to determine when lock reclaim should be
attempted, as described in Section 8.4.2.1. attempted, as described in Section 8.4.2.1.
When two replies from EXCHANGE_ID, each from two different server When two replies from EXCHANGE_ID, each from two different server
network addresses, have the same server scope, there are a number of network addresses, have the same server scope, there are a number of
ways a client can validate that the common server scope is due to two ways a client can validate that the common server scope is due to two
servers cooperating in a group. servers cooperating in a group.
o If both EXCHANGE_ID requests were sent with RPCSEC_GSS ([4], [9], * If both EXCHANGE_ID requests were sent with RPCSEC_GSS ([4], [9],
[27]) authentication and the server principal is the same for both [27]) authentication and the server principal is the same for both
targets, the equality of server scope is validated. It is targets, the equality of server scope is validated. It is
RECOMMENDED that two servers intending to share the same server RECOMMENDED that two servers intending to share the same server
scope and server_owner major_id also share the same principal scope and server_owner major_id also share the same principal
name. In some cases, this simplifies the client's task of name. In some cases, this simplifies the client's task of
validating server scope. validating server scope.
o The client may accept the appearance of the second server in the * The client may accept the appearance of the second server in the
fs_locations or fs_locations_info attribute for a relevant file fs_locations or fs_locations_info attribute for a relevant file
system. For example, if there is a migration event for a system. For example, if there is a migration event for a
particular file system or there are locks to be reclaimed on a particular file system or there are locks to be reclaimed on a
particular file system, the attributes for that particular file particular file system, the attributes for that particular file
system may be used. The client sends the GETATTR request to the system may be used. The client sends the GETATTR request to the
first server for the fs_locations or fs_locations_info attribute first server for the fs_locations or fs_locations_info attribute
with RPCSEC_GSS authentication. It may need to do this in advance with RPCSEC_GSS authentication. It may need to do this in advance
of the need to verify the common server scope. If the client of the need to verify the common server scope. If the client
successfully authenticates the reply to GETATTR, and the GETATTR successfully authenticates the reply to GETATTR, and the GETATTR
request and reply containing the fs_locations or fs_locations_info request and reply containing the fs_locations or fs_locations_info
skipping to change at page 46, line 42 skipping to change at line 2184
system involved (e.g. a file system being migrated). system involved (e.g. a file system being migrated).
2.10.5. Trunking 2.10.5. Trunking
Trunking is the use of multiple connections between a client and Trunking is the use of multiple connections between a client and
server in order to increase the speed of data transfer. NFSv4.1 server in order to increase the speed of data transfer. NFSv4.1
supports two types of trunking: session trunking and client ID supports two types of trunking: session trunking and client ID
trunking. trunking.
In the context of a single server network address, it can be assumed In the context of a single server network address, it can be assumed
that all connections are accessing the same server and NFSv4.1 that all connections are accessing the same server, and NFSv4.1
servers MUST support both forms of trunking. When multiple servers MUST support both forms of trunking. When multiple
connections use a set of network addresses accessing the same server, connections use a set of network addresses to access the same server,
the server MUST support both forms of trunking. NFSv4.1 servers in a the server MUST support both forms of trunking. NFSv4.1 servers in a
clustered configuration MAY allow network addresses for different clustered configuration MAY allow network addresses for different
servers to use client ID trunking. servers to use client ID trunking.
Clients may use either form of trunking as long as they do not, when Clients may use either form of trunking as long as they do not, when
trunking between different server network addresses, violate the trunking between different server network addresses, violate the
servers' mandates as to the kinds of trunking to be allowed (see servers' mandates as to the kinds of trunking to be allowed (see
below). With regard to callback channels, the client MUST allow the below). With regard to callback channels, the client MUST allow the
server to choose among all callback channels valid for a given client server to choose among all callback channels valid for a given client
ID and MUST support trunking when the connections supporting the ID and MUST support trunking when the connections supporting the
skipping to change at page 48, line 39 skipping to change at line 2278
When doing client ID trunking, locking state is shared across When doing client ID trunking, locking state is shared across
sessions associated with that same client ID. This requires the sessions associated with that same client ID. This requires the
server to coordinate state across sessions and the client to be server to coordinate state across sessions and the client to be
able to associate the same locking state with multiple sessions. able to associate the same locking state with multiple sessions.
It is always possible that, as a result of various sorts of It is always possible that, as a result of various sorts of
reconfiguration events, eir_server_scope and eir_server_owner values reconfiguration events, eir_server_scope and eir_server_owner values
may be different on subsequent EXCHANGE_ID requests made to the same may be different on subsequent EXCHANGE_ID requests made to the same
network address. network address.
In most cases such reconfiguration events will be disruptive and In most cases, such reconfiguration events will be disruptive and
indicate that an IP address formerly connected to one server is now indicate that an IP address formerly connected to one server is now
connected to an entirely different one. connected to an entirely different one.
Some guidelines on client handling of such situations follow: Some guidelines on client handling of such situations follow:
o When eir_server_scope changes, the client has no assurance that * When eir_server_scope changes, the client has no assurance that
any id's it obtained previously (e.g. file handles) can be validly any IDs that it obtained previously (e.g., filehandles) can be
used on the new server, and, even if the new server accepts them, validly used on the new server, and, even if the new server
there is no assurance that this is not due to accident. Thus, it accepts them, there is no assurance that this is not due to
is best to treat all such state as lost/stale although a client accident. Thus, it is best to treat all such state as lost or
may assume that the probability of inadvertent acceptance is low stale, although a client may assume that the probability of
and treat this situation as within the next case. inadvertent acceptance is low and treat this situation as within
the next case.
o When eir_server_scope remains the same and * When eir_server_scope remains the same and
eir_server_owner.so_major_id changes, the client can use the eir_server_owner.so_major_id changes, the client can use the
filehandles it has, consider its locking state lost, and attempt filehandles it has, consider its locking state lost, and attempt
to reclaim or otherwise re-obtain its locks. It might find that to reclaim or otherwise re-obtain its locks. It might find that
its file handle is now stale. However, if NFS4ERR_STALE is not its filehandle is now stale. However, if NFS4ERR_STALE is not
returned, it can proceed to reclaim or otherwise re-obtain its returned, it can proceed to reclaim or otherwise re-obtain its
open locking state. open locking state.
o When eir_server_scope and eir_server_owner.so_major_id remain the * When eir_server_scope and eir_server_owner.so_major_id remain the
same, the client has to use the now-current values of same, the client has to use the now-current values of
eir_server_owner.so_minor_id in deciding on appropriate forms of eir_server_owner.so_minor_id in deciding on appropriate forms of
trunking. This may result in connections being dropped or new trunking. This may result in connections being dropped or new
sessions being created. sessions being created.
2.10.5.1. Verifying Claims of Matching Server Identity 2.10.5.1. Verifying Claims of Matching Server Identity
When the server responds using two different connections claiming When the server responds using two different connections that claim
matching or partially matching eir_server_owner, eir_server_scope, matching or partially matching eir_server_owner, eir_server_scope,
and eir_clientid values, the client does not have to trust the and eir_clientid values, the client does not have to trust the
servers' claims. The client may verify these claims before trunking servers' claims. The client may verify these claims before trunking
traffic in the following ways: traffic in the following ways:
o For session trunking, clients SHOULD reliably verify if * For session trunking, clients SHOULD reliably verify if
connections between different network paths are in fact associated connections between different network paths are in fact associated
with the same NFSv4.1 server and usable on the same session, and with the same NFSv4.1 server and usable on the same session, and
servers MUST allow clients to perform reliable verification. When servers MUST allow clients to perform reliable verification. When
a client ID is created, the client SHOULD specify that a client ID is created, the client SHOULD specify that
BIND_CONN_TO_SESSION is to be verified according to the SP4_SSV or BIND_CONN_TO_SESSION is to be verified according to the SP4_SSV or
SP4_MACH_CRED (Section 18.35) state protection options. For SP4_MACH_CRED (Section 18.35) state protection options. For
SP4_SSV, reliable verification depends on a shared secret (the SP4_SSV, reliable verification depends on a shared secret (the
SSV) that is established via the SET_SSV (see Section 18.47) SSV) that is established via the SET_SSV (see Section 18.47)
operation. operation.
skipping to change at page 50, line 17 skipping to change at line 2351
not be verified by the client, so the client will know it cannot not be verified by the client, so the client will know it cannot
use the connection for trunking the specified session. use the connection for trunking the specified session.
If the client specified SP4_MACH_CRED state protection, the If the client specified SP4_MACH_CRED state protection, the
BIND_CONN_TO_SESSION operation will use RPCSEC_GSS integrity or BIND_CONN_TO_SESSION operation will use RPCSEC_GSS integrity or
privacy, using the same credential that was used when the client privacy, using the same credential that was used when the client
ID was created. Mutual authentication via RPCSEC_GSS assures the ID was created. Mutual authentication via RPCSEC_GSS assures the
client that the connection is associated with the correct session client that the connection is associated with the correct session
of the correct server. of the correct server.
o For client ID trunking, the client has at least two options for * For client ID trunking, the client has at least two options for
verifying that the same client ID obtained from two different verifying that the same client ID obtained from two different
EXCHANGE_ID operations came from the same server. The first EXCHANGE_ID operations came from the same server. The first
option is to use RPCSEC_GSS authentication when sending each option is to use RPCSEC_GSS authentication when sending each
EXCHANGE_ID operation. Each time an EXCHANGE_ID is sent with EXCHANGE_ID operation. Each time an EXCHANGE_ID is sent with
RPCSEC_GSS authentication, the client notes the principal name of RPCSEC_GSS authentication, the client notes the principal name of
the GSS target. If the EXCHANGE_ID results indicate that client the GSS target. If the EXCHANGE_ID results indicate that client
ID trunking is possible, and the GSS targets' principal names are ID trunking is possible, and the GSS targets' principal names are
the same, the servers are the same and client ID trunking is the same, the servers are the same and client ID trunking is
allowed. allowed.
skipping to change at page 51, line 14 skipping to change at line 2394
Each COMPOUND or CB_COMPOUND request that is sent with a leading Each COMPOUND or CB_COMPOUND request that is sent with a leading
SEQUENCE or CB_SEQUENCE operation MUST be executed by the receiver SEQUENCE or CB_SEQUENCE operation MUST be executed by the receiver
exactly once. This requirement holds regardless of whether the exactly once. This requirement holds regardless of whether the
request is sent with reply caching specified (see request is sent with reply caching specified (see
Section 2.10.6.1.3). The requirement holds even if the requester is Section 2.10.6.1.3). The requirement holds even if the requester is
sending the request over a session created between a pNFS data client sending the request over a session created between a pNFS data client
and pNFS data server. To understand the rationale for this and pNFS data server. To understand the rationale for this
requirement, divide the requests into three classifications: requirement, divide the requests into three classifications:
o Non-idempotent requests. * Non-idempotent requests.
o Idempotent modifying requests. * Idempotent modifying requests.
o Idempotent non-modifying requests. * Idempotent non-modifying requests.
An example of a non-idempotent request is RENAME. Obviously, if a An example of a non-idempotent request is RENAME. Obviously, if a
replier executes the same RENAME request twice, and the first replier executes the same RENAME request twice, and the first
execution succeeds, the re-execution will fail. If the replier execution succeeds, the re-execution will fail. If the replier
returns the result from the re-execution, this result is incorrect. returns the result from the re-execution, this result is incorrect.
Therefore, EOS is required for non-idempotent requests. Therefore, EOS is required for non-idempotent requests.
An example of an idempotent modifying request is a COMPOUND request An example of an idempotent modifying request is a COMPOUND request
containing a WRITE operation. Repeated execution of the same WRITE containing a WRITE operation. Repeated execution of the same WRITE
has the same effect as execution of that WRITE a single time. has the same effect as execution of that WRITE a single time.
skipping to change at page 52, line 33 skipping to change at line 2460
cannot be interpreted by the replier except to test for equality with cannot be interpreted by the replier except to test for equality with
previously sent requests. When consulting an RPC-based duplicate previously sent requests. When consulting an RPC-based duplicate
request cache, the opaqueness of the XID requires a computationally request cache, the opaqueness of the XID requires a computationally
expensive look up (often via a hash that includes XID and source expensive look up (often via a hash that includes XID and source
address). NFSv4.1 requests use a non-opaque slot ID, which is an address). NFSv4.1 requests use a non-opaque slot ID, which is an
index into a slot table, which is far more efficient. Second, index into a slot table, which is far more efficient. Second,
because RPC requests can be executed by the replier in any order, because RPC requests can be executed by the replier in any order,
there is no bound on the number of requests that may be outstanding there is no bound on the number of requests that may be outstanding
at any time. To achieve perfect EOS, using ONC RPC would require at any time. To achieve perfect EOS, using ONC RPC would require
storing all replies in the reply cache. XIDs are 32 bits; storing storing all replies in the reply cache. XIDs are 32 bits; storing
over four billion (2^32) replies in the reply cache is not practical. over four billion (2^(32)) replies in the reply cache is not
In practice, previous versions of NFS have chosen to store a fixed practical. In practice, previous versions of NFS have chosen to
number of replies in the cache, and to use a least recently used store a fixed number of replies in the cache, and to use a least
(LRU) approach to replacing cache entries with new entries when the recently used (LRU) approach to replacing cache entries with new
cache is full. In NFSv4.1, the number of outstanding requests is entries when the cache is full. In NFSv4.1, the number of
bounded by the size of the slot table, and a sequence ID per slot is outstanding requests is bounded by the size of the slot table, and a
used to tell the replier when it is safe to delete a cached reply. sequence ID per slot is used to tell the replier when it is safe to
delete a cached reply.
In the NFSv4.1 reply cache, when the requester sends a new request, In the NFSv4.1 reply cache, when the requester sends a new request,
it selects a slot ID in the range 0..N, where N is the replier's it selects a slot ID in the range 0..N, where N is the replier's
current maximum slot ID granted to the requester on the session over current maximum slot ID granted to the requester on the session over
which the request is to be sent. The value of N starts out as equal which the request is to be sent. The value of N starts out as equal
to ca_maxrequests - 1 (Section 18.36), but can be adjusted by the to ca_maxrequests - 1 (Section 18.36), but can be adjusted by the
response to SEQUENCE or CB_SEQUENCE as described later in this response to SEQUENCE or CB_SEQUENCE as described later in this
section. The slot ID must be unused by any of the requests that the section. The slot ID must be unused by any of the requests that the
requester has already active on the session. "Unused" here means the requester has already active on the session. "Unused" here means the
requester has no outstanding request for that slot ID. requester has no outstanding request for that slot ID.
A slot contains a sequence ID and the cached reply corresponding to A slot contains a sequence ID and the cached reply corresponding to
the request sent with that sequence ID. The sequence ID is a 32-bit the request sent with that sequence ID. The sequence ID is a 32-bit
unsigned value, and is therefore in the range 0..0xFFFFFFFF (2^32 - unsigned value, and is therefore in the range 0..0xFFFFFFFF (2^(32) -
1). The first time a slot is used, the requester MUST specify a 1). The first time a slot is used, the requester MUST specify a
sequence ID of one (Section 18.36). Each time a slot is reused, the sequence ID of one (Section 18.36). Each time a slot is reused, the
request MUST specify a sequence ID that is one greater than that of request MUST specify a sequence ID that is one greater than that of
the previous request on the slot. If the previous sequence ID was the previous request on the slot. If the previous sequence ID was
0xFFFFFFFF, then the next request for the slot MUST have the sequence 0xFFFFFFFF, then the next request for the slot MUST have the sequence
ID set to zero (i.e., (2^32 - 1) + 1 mod 2^32). ID set to zero (i.e., (2^(32) - 1) + 1 mod 2^(32)).
The sequence ID accompanies the slot ID in each request. It is for The sequence ID accompanies the slot ID in each request. It is for
the critical check at the replier: it used to efficiently determine the critical check at the replier: it used to efficiently determine
whether a request using a certain slot ID is a retransmit or a new, whether a request using a certain slot ID is a retransmit or a new,
never-before-seen request. It is not feasible for the requester to never-before-seen request. It is not feasible for the requester to
assert that it is retransmitting to implement this, because for any assert that it is retransmitting to implement this, because for any
given request the requester cannot know whether the replier has seen given request the requester cannot know whether the replier has seen
it unless the replier actually replies. Of course, if the requester it unless the replier actually replies. Of course, if the requester
has seen the reply, the requester would not retransmit. has seen the reply, the requester would not retransmit.
The replier compares each received request's sequence ID with the The replier compares each received request's sequence ID with the
last one previously received for that slot ID, to see if the new last one previously received for that slot ID, to see if the new
request is: request is:
o A new request, in which the sequence ID is one greater than that * A new request, in which the sequence ID is one greater than that
previously seen in the slot (accounting for sequence wraparound). previously seen in the slot (accounting for sequence wraparound).
The replier proceeds to execute the new request, and the replier The replier proceeds to execute the new request, and the replier
MUST increase the slot's sequence ID by one. MUST increase the slot's sequence ID by one.
o A retransmitted request, in which the sequence ID is equal to that * A retransmitted request, in which the sequence ID is equal to that
currently recorded in the slot. If the original request has currently recorded in the slot. If the original request has
executed to completion, the replier returns the cached reply. See executed to completion, the replier returns the cached reply. See
Section 2.10.6.2 for direction on how the replier deals with Section 2.10.6.2 for direction on how the replier deals with
retries of requests that are still in progress. retries of requests that are still in progress.
o A misordered retry, in which the sequence ID is less than * A misordered retry, in which the sequence ID is less than
(accounting for sequence wraparound) that previously seen in the (accounting for sequence wraparound) that previously seen in the
slot. The replier MUST return NFS4ERR_SEQ_MISORDERED (as the slot. The replier MUST return NFS4ERR_SEQ_MISORDERED (as the
result from SEQUENCE or CB_SEQUENCE). result from SEQUENCE or CB_SEQUENCE).
o A misordered new request, in which the sequence ID is two or more * A misordered new request, in which the sequence ID is two or more
than (accounting for sequence wraparound) that previously seen in than (accounting for sequence wraparound) that previously seen in
the slot. Note that because the sequence ID MUST wrap around to the slot. Note that because the sequence ID MUST wrap around to
zero once it reaches 0xFFFFFFFF, a misordered new request and a zero once it reaches 0xFFFFFFFF, a misordered new request and a
misordered retry cannot be distinguished. Thus, the replier MUST misordered retry cannot be distinguished. Thus, the replier MUST
return NFS4ERR_SEQ_MISORDERED (as the result from SEQUENCE or return NFS4ERR_SEQ_MISORDERED (as the result from SEQUENCE or
CB_SEQUENCE). CB_SEQUENCE).
Unlike the XID, the slot ID is always within a specific range; this Unlike the XID, the slot ID is always within a specific range; this
has two implications. The first implication is that for a given has two implications. The first implication is that for a given
session, the replier need only cache the results of a limited number session, the replier need only cache the results of a limited number
skipping to change at page 54, line 28 skipping to change at line 2553
addition, the RPC XID is not used in the reply cache, enhancing addition, the RPC XID is not used in the reply cache, enhancing
robustness of the cache in the face of any rapid reuse of XIDs by the robustness of the cache in the face of any rapid reuse of XIDs by the
requester. While the replier does not care about the XID for the requester. While the replier does not care about the XID for the
purposes of reply cache management (but the replier MUST return the purposes of reply cache management (but the replier MUST return the
same XID that was in the request), nonetheless there are same XID that was in the request), nonetheless there are
considerations for the XID in NFSv4.1 that are the same as all other considerations for the XID in NFSv4.1 that are the same as all other
previous versions of NFS. The RPC XID remains in each message and previous versions of NFS. The RPC XID remains in each message and
needs to be formulated in NFSv4.1 requests as in any other ONC RPC needs to be formulated in NFSv4.1 requests as in any other ONC RPC
request. The reasons include: request. The reasons include:
o The RPC layer retains its existing semantics and implementation. * The RPC layer retains its existing semantics and implementation.
o The requester and replier must be able to interoperate at the RPC * The requester and replier must be able to interoperate at the RPC
layer, prior to the NFSv4.1 decoding of the SEQUENCE or layer, prior to the NFSv4.1 decoding of the SEQUENCE or
CB_SEQUENCE operation. CB_SEQUENCE operation.
o If an operation is being used that does not start with SEQUENCE or * If an operation is being used that does not start with SEQUENCE or
CB_SEQUENCE (e.g., BIND_CONN_TO_SESSION), then the RPC XID is CB_SEQUENCE (e.g., BIND_CONN_TO_SESSION), then the RPC XID is
needed for correct operation to match the reply to the request. needed for correct operation to match the reply to the request.
o The SEQUENCE or CB_SEQUENCE operation may generate an error. If * The SEQUENCE or CB_SEQUENCE operation may generate an error. If
so, the embedded slot ID, sequence ID, and session ID (if present) so, the embedded slot ID, sequence ID, and session ID (if present)
in the request will not be in the reply, and the requester has in the request will not be in the reply, and the requester has
only the XID to match the reply to the request. only the XID to match the reply to the request.
Given that well-formulated XIDs continue to be required, this raises Given that well-formulated XIDs continue to be required, this raises
the question: why do SEQUENCE and CB_SEQUENCE replies have a session the question: why do SEQUENCE and CB_SEQUENCE replies have a session
ID, slot ID, and sequence ID? Having the session ID in the reply ID, slot ID, and sequence ID? Having the session ID in the reply
means that the requester does not have to use the XID to look up the means that the requester does not have to use the XID to look up the
session ID, which would be necessary if the connection were session ID, which would be necessary if the connection were
associated with multiple sessions. Having the slot ID and sequence associated with multiple sessions. Having the slot ID and sequence
ID in the reply means that the requester does not have to use the XID ID in the reply means that the requester does not have to use the XID
to look up the slot ID and sequence ID. Furthermore, since the XID to look up the slot ID and sequence ID. Furthermore, since the XID
is only 32 bits, it is too small to guarantee the re-association of a is only 32 bits, it is too small to guarantee the re-association of a
reply with its request [43]; having session ID, slot ID, and sequence reply with its request [44]; having session ID, slot ID, and sequence
ID in the reply allows the client to validate that the reply in fact ID in the reply allows the client to validate that the reply in fact
belongs to the matched request. belongs to the matched request.
The SEQUENCE (and CB_SEQUENCE) operation also carries a The SEQUENCE (and CB_SEQUENCE) operation also carries a
"highest_slotid" value, which carries additional requester slot usage "highest_slotid" value, which carries additional requester slot usage
information. The requester MUST always indicate the slot ID information. The requester MUST always indicate the slot ID
representing the outstanding request with the highest-numbered slot representing the outstanding request with the highest-numbered slot
value. The requester should in all cases provide the most value. The requester should in all cases provide the most
conservative value possible, although it can be increased somewhat conservative value possible, although it can be increased somewhat
above the actual instantaneous usage to maintain some minimum or above the actual instantaneous usage to maintain some minimum or
optimal level. This provides a way for the requester to yield unused optimal level. This provides a way for the requester to yield unused
request slots back to the replier, which in turn can use the request slots back to the replier, which in turn can use the
information to reallocate resources. information to reallocate resources.
The replier responds with both a new target highest_slotid and an The replier responds with both a new target highest_slotid and an
enforced highest_slotid, described as follows: enforced highest_slotid, described as follows:
o The target highest_slotid is an indication to the requester of the * The target highest_slotid is an indication to the requester of the
highest_slotid the replier wishes the requester to be using. This highest_slotid the replier wishes the requester to be using. This
permits the replier to withdraw (or add) resources from a permits the replier to withdraw (or add) resources from a
requester that has been found to not be using them, in order to requester that has been found to not be using them, in order to
more fairly share resources among a varying level of demand from more fairly share resources among a varying level of demand from
other requesters. The requester must always comply with the other requesters. The requester must always comply with the
replier's value updates, since they indicate newly established replier's value updates, since they indicate newly established
hard limits on the requester's access to session resources. hard limits on the requester's access to session resources.
However, because of request pipelining, the requester may have However, because of request pipelining, the requester may have
active requests in flight reflecting prior values; therefore, the active requests in flight reflecting prior values; therefore, the
replier must not immediately require the requester to comply. replier must not immediately require the requester to comply.
o The enforced highest_slotid indicates the highest slot ID the * The enforced highest_slotid indicates the highest slot ID the
requester is permitted to use on a subsequent SEQUENCE or requester is permitted to use on a subsequent SEQUENCE or
CB_SEQUENCE operation. The replier's enforced highest_slotid CB_SEQUENCE operation. The replier's enforced highest_slotid
SHOULD be no less than the highest_slotid the requester indicated SHOULD be no less than the highest_slotid the requester indicated
in the SEQUENCE or CB_SEQUENCE arguments. in the SEQUENCE or CB_SEQUENCE arguments.
A requester can be intransigent with respect to lowering its A requester can be intransigent with respect to lowering its
highest_slotid argument to a Sequence operation, i.e. the highest_slotid argument to a Sequence operation, i.e. the
requester continues to ignore the target highest_slotid in the requester continues to ignore the target highest_slotid in the
response to a Sequence operation, and continues to set its response to a Sequence operation, and continues to set its
highest_slotid argument to be higher than the target highest_slotid argument to be higher than the target
skipping to change at page 56, line 28 skipping to change at line 2647
enforced highest_slotid, the requester is only allowed to send enforced highest_slotid, the requester is only allowed to send
retries on slots that exceed the replier's highest_slotid. If a retries on slots that exceed the replier's highest_slotid. If a
request is received with a slot ID that is higher than the new request is received with a slot ID that is higher than the new
enforced highest_slotid, and the sequence ID is one higher than enforced highest_slotid, and the sequence ID is one higher than
what is in the slot's reply cache, then the server can both retire what is in the slot's reply cache, then the server can both retire
the slot and return NFS4ERR_BADSLOT (however, the server MUST NOT the slot and return NFS4ERR_BADSLOT (however, the server MUST NOT
do one and not the other). The reason it is safe to retire the do one and not the other). The reason it is safe to retire the
slot is because by using the next sequence ID, the requester is slot is because by using the next sequence ID, the requester is
indicating it has received the previous reply for the slot. indicating it has received the previous reply for the slot.
o The requester SHOULD use the lowest available slot when sending a * The requester SHOULD use the lowest available slot when sending a
new request. This way, the replier may be able to retire slot new request. This way, the replier may be able to retire slot
entries faster. However, where the replier is actively adjusting entries faster. However, where the replier is actively adjusting
its granted highest_slotid, it will not be able to use only the its granted highest_slotid, it will not be able to use only the
receipt of the slot ID and highest_slotid in the request. Neither receipt of the slot ID and highest_slotid in the request. Neither
the slot ID nor the highest_slotid used in a request may reflect the slot ID nor the highest_slotid used in a request may reflect
the replier's current idea of the requester's session limit, the replier's current idea of the requester's session limit,
because the request may have been sent from the requester before because the request may have been sent from the requester before
the update was received. Therefore, in the downward adjustment the update was received. Therefore, in the downward adjustment
case, the replier may have to retain a number of reply cache case, the replier may have to retain a number of reply cache
entries at least as large as the old value of maximum requests entries at least as large as the old value of maximum requests
skipping to change at page 57, line 41 skipping to change at line 2706
cache entry for the slot whenever an error is returned from SEQUENCE cache entry for the slot whenever an error is returned from SEQUENCE
or CB_SEQUENCE. or CB_SEQUENCE.
2.10.6.1.3. Optional Reply Caching 2.10.6.1.3. Optional Reply Caching
On a per-request basis, the requester can choose to direct the On a per-request basis, the requester can choose to direct the
replier to cache the reply to all operations after the first replier to cache the reply to all operations after the first
operation (SEQUENCE or CB_SEQUENCE) via the sa_cachethis or operation (SEQUENCE or CB_SEQUENCE) via the sa_cachethis or
csa_cachethis fields of the arguments to SEQUENCE or CB_SEQUENCE. csa_cachethis fields of the arguments to SEQUENCE or CB_SEQUENCE.
The reason it would not direct the replier to cache the entire reply The reason it would not direct the replier to cache the entire reply
is that the request is composed of all idempotent operations [40]. is that the request is composed of all idempotent operations [41].
Caching the reply may offer little benefit. If the reply is too Caching the reply may offer little benefit. If the reply is too
large (see Section 2.10.6.4), it may not be cacheable anyway. Even large (see Section 2.10.6.4), it may not be cacheable anyway. Even
if the reply to idempotent request is small enough to cache, if the reply to idempotent request is small enough to cache,
unnecessarily caching the reply slows down the server and increases unnecessarily caching the reply slows down the server and increases
RPC latency. RPC latency.
Whether or not the requester requests the reply to be cached has no Whether or not the requester requests the reply to be cached has no
effect on the slot processing. If the result of SEQUENCE or effect on the slot processing. If the result of SEQUENCE or
CB_SEQUENCE is NFS4_OK, then the slot's sequence ID MUST be CB_SEQUENCE is NFS4_OK, then the slot's sequence ID MUST be
incremented by one. If a requester does not direct the replier to incremented by one. If a requester does not direct the replier to
cache the reply, the replier MUST do one of following: cache the reply, the replier MUST do one of following:
o The replier can cache the entire original reply. Even though * The replier can cache the entire original reply. Even though
sa_cachethis or csa_cachethis is FALSE, the replier is always free sa_cachethis or csa_cachethis is FALSE, the replier is always free
to cache. It may choose this approach in order to simplify to cache. It may choose this approach in order to simplify
implementation. implementation.
o The replier enters into its reply cache a reply consisting of the * The replier enters into its reply cache a reply consisting of the
original results to the SEQUENCE or CB_SEQUENCE operation, and original results to the SEQUENCE or CB_SEQUENCE operation, and
with the next operation in COMPOUND or CB_COMPOUND having the with the next operation in COMPOUND or CB_COMPOUND having the
error NFS4ERR_RETRY_UNCACHED_REP. Thus, if the requester later error NFS4ERR_RETRY_UNCACHED_REP. Thus, if the requester later
retries the request, it will get NFS4ERR_RETRY_UNCACHED_REP. If a retries the request, it will get NFS4ERR_RETRY_UNCACHED_REP. If a
replier receives a retried Sequence operation where the reply to replier receives a retried Sequence operation where the reply to
the COMPOUND or CB_COMPOUND was not cached, then the replier, the COMPOUND or CB_COMPOUND was not cached, then the replier,
* MAY return NFS4ERR_RETRY_UNCACHED_REP in reply to a Sequence - MAY return NFS4ERR_RETRY_UNCACHED_REP in reply to a Sequence
operation if the Sequence operation is not the first operation operation if the Sequence operation is not the first operation
(granted, a requester that does so is in violation of the (granted, a requester that does so is in violation of the
NFSv4.1 protocol). NFSv4.1 protocol).
* MUST NOT return NFS4ERR_RETRY_UNCACHED_REP in reply to a - MUST NOT return NFS4ERR_RETRY_UNCACHED_REP in reply to a
Sequence operation if the Sequence operation is the first Sequence operation if the Sequence operation is the first
operation. operation.
o If the second operation is an illegal operation, or an operation * If the second operation is an illegal operation, or an operation
that was legal in a previous minor version of NFSv4 and MUST NOT that was legal in a previous minor version of NFSv4 and MUST NOT
be supported in the current minor version (e.g., SETCLIENTID), the be supported in the current minor version (e.g., SETCLIENTID), the
replier MUST NOT ever return NFS4ERR_RETRY_UNCACHED_REP. Instead replier MUST NOT ever return NFS4ERR_RETRY_UNCACHED_REP. Instead
the replier MUST return NFS4ERR_OP_ILLEGAL or NFS4ERR_BADXDR or the replier MUST return NFS4ERR_OP_ILLEGAL or NFS4ERR_BADXDR or
NFS4ERR_NOTSUPP as appropriate. NFS4ERR_NOTSUPP as appropriate.
o If the second operation can result in another error status, the * If the second operation can result in another error status, the
replier MAY return a status other than NFS4ERR_RETRY_UNCACHED_REP, replier MAY return a status other than NFS4ERR_RETRY_UNCACHED_REP,
provided the operation is not executed in such a way that the provided the operation is not executed in such a way that the
state of the replier is changed. Examples of such an error status state of the replier is changed. Examples of such an error status
include: NFS4ERR_NOTSUPP returned for an operation that is legal include: NFS4ERR_NOTSUPP returned for an operation that is legal
but not REQUIRED in the current minor versions, and thus not but not REQUIRED in the current minor versions, and thus not
supported by the replier; NFS4ERR_SEQUENCE_POS; and supported by the replier; NFS4ERR_SEQUENCE_POS; and
NFS4ERR_REQ_TOO_BIG. NFS4ERR_REQ_TOO_BIG.
The discussion above assumes that the retried request matches the The discussion above assumes that the retried request matches the
original one. Section 2.10.6.1.3.1 discusses what the replier might original one. Section 2.10.6.1.3.1 discusses what the replier might
do, and MUST do when original and retried requests do not match. do, and MUST do when original and retried requests do not match.
Since the replier may only cache a small amount of the information Since the replier may only cache a small amount of the information
that would be required to determine whether this is a case of a false that would be required to determine whether this is a case of a false
retry, the replier may send to the client any of the following retry, the replier may send to the client any of the following
responses: responses:
o The cached reply to the original request (if the replier has * The cached reply to the original request (if the replier has
cached it in its entirety and the users of the original request cached it in its entirety and the users of the original request
and retry match). and retry match).
o A reply that consists only of the Sequence operation with the * A reply that consists only of the Sequence operation with the
error NFS4ERR_FALSE_RETRY. error NFS4ERR_SEQ_FALSE_RETRY.
o A reply consisting of the response to Sequence with the status * A reply consisting of the response to Sequence with the status
NFS4_OK, together with the second operation as it appeared in the NFS4_OK, together with the second operation as it appeared in the
retried request with an error of NFS4ERR_RETRY_UNCACHED_REP or retried request with an error of NFS4ERR_RETRY_UNCACHED_REP or
other error as described above. other error as described above.
o A reply that consists of the response to Sequence with the status * A reply that consists of the response to Sequence with the status
NFS4_OK, together with the second operation as it appeared in the NFS4_OK, together with the second operation as it appeared in the
original request with an error of NFS4ERR_RETRY_UNCACHED_REP or original request with an error of NFS4ERR_RETRY_UNCACHED_REP or
other error as described above. other error as described above.
2.10.6.1.3.1. False Retry 2.10.6.1.3.1. False Retry
If a requester sent a Sequence operation with a slot ID and sequence If a requester sent a Sequence operation with a slot ID and sequence
ID that are in the reply cache but the replier detected that the ID that are in the reply cache but the replier detected that the
retried request is not the same as the original request, including a retried request is not the same as the original request, including a
retry that has different operations or different arguments in the retry that has different operations or different arguments in the
operations from the original and a retry that uses a different operations from the original and a retry that uses a different
principal in the RPC request's credential field that translates to a principal in the RPC request's credential field that translates to a
different user, then this is a false retry. When the replier detects different user, then this is a false retry. When the replier detects
a false retry, it is permitted (but not always obligated) to return a false retry, it is permitted (but not always obligated) to return
NFS4ERR_FALSE_RETRY in response to the Sequence operation when it NFS4ERR_SEQ_FALSE_RETRY in response to the Sequence operation when it
detects a false retry. detects a false retry.
Translations of particularly privileged user values to other users Translations of particularly privileged user values to other users
due to the lack of appropriately secure credentials, as configured on due to the lack of appropriately secure credentials, as configured on
the replier, should be applied before determining whether the users the replier, should be applied before determining whether the users
are the same or different. If the replier determines the users are are the same or different. If the replier determines the users are
different between the original request and a retry, then the replier different between the original request and a retry, then the replier
MUST return NFS4ERR_FALSE_RETRY. MUST return NFS4ERR_SEQ_FALSE_RETRY.
If an operation of the retry is an illegal operation, or an operation If an operation of the retry is an illegal operation, or an operation
that was legal in a previous minor version of NFSv4 and MUST NOT be that was legal in a previous minor version of NFSv4 and MUST NOT be
supported in the current minor version (e.g., SETCLIENTID), the supported in the current minor version (e.g., SETCLIENTID), the
replier MAY return NFS4ERR_FALSE_RETRY (and MUST do so if the users replier MAY return NFS4ERR_SEQ_FALSE_RETRY (and MUST do so if the
of the original request and retry differ). Otherwise, the replier users of the original request and retry differ). Otherwise, the
MAY return NFS4ERR_OP_ILLEGAL or NFS4ERR_BADXDR or NFS4ERR_NOTSUPP as replier MAY return NFS4ERR_OP_ILLEGAL or NFS4ERR_BADXDR or
appropriate. Note that the handling is in contrast for how the NFS4ERR_NOTSUPP as appropriate. Note that the handling is in
replier deals with retries requests with no cached reply. The contrast for how the replier deals with retries requests with no
difference is due to NFS4ERR_FALSE_RETRY being a valid error for only cached reply. The difference is due to NFS4ERR_SEQ_FALSE_RETRY being
Sequence operations, whereas NFS4ERR_RETRY_UNCACHED_REP is a valid a valid error for only Sequence operations, whereas
error for all operations except illegal operations and operations NFS4ERR_RETRY_UNCACHED_REP is a valid error for all operations except
that MUST NOT be supported in the current minor version of NFSv4. illegal operations and operations that MUST NOT be supported in the
current minor version of NFSv4.
2.10.6.2. Retry and Replay of Reply 2.10.6.2. Retry and Replay of Reply
A requester MUST NOT retry a request, unless the connection it used A requester MUST NOT retry a request, unless the connection it used
to send the request disconnects. The requester can then reconnect to send the request disconnects. The requester can then reconnect
and re-send the request, or it can re-send the request over a and re-send the request, or it can re-send the request over a
different connection that is associated with the same session. different connection that is associated with the same session.
If the requester is a server wanting to re-send a callback operation If the requester is a server wanting to re-send a callback operation
over the backchannel of a session, the requester of course cannot over the backchannel of a session, the requester of course cannot
skipping to change at page 63, line 31 skipping to change at line 2983
A client needs to take care that, when sending operations that change A client needs to take care that, when sending operations that change
the current filehandle (except for PUTFH, PUTPUBFH, PUTROOTFH, and the current filehandle (except for PUTFH, PUTPUBFH, PUTROOTFH, and
RESTOREFH), it does not exceed the maximum reply buffer before the RESTOREFH), it does not exceed the maximum reply buffer before the
GETFH operation. Otherwise, the client will have to retry the GETFH operation. Otherwise, the client will have to retry the
operation that changed the current filehandle, in order to obtain the operation that changed the current filehandle, in order to obtain the
desired filehandle. For the OPEN operation (see Section 18.16), desired filehandle. For the OPEN operation (see Section 18.16),
retry is not always available as an option. The following guidelines retry is not always available as an option. The following guidelines
for the handling of filehandle-changing operations are advised: for the handling of filehandle-changing operations are advised:
o Within the same COMPOUND procedure, a client SHOULD send GETFH * Within the same COMPOUND procedure, a client SHOULD send GETFH
immediately after a current filehandle-changing operation. A immediately after a current filehandle-changing operation. A
client MUST send GETFH after a current filehandle-changing client MUST send GETFH after a current filehandle-changing
operation that is also non-idempotent (e.g., the OPEN operation), operation that is also non-idempotent (e.g., the OPEN operation),
unless the operation is RESTOREFH. RESTOREFH is an exception, unless the operation is RESTOREFH. RESTOREFH is an exception,
because even though it is non-idempotent, the filehandle RESTOREFH because even though it is non-idempotent, the filehandle RESTOREFH
produced originated from an operation that is either idempotent produced originated from an operation that is either idempotent
(e.g., PUTFH, LOOKUP), or non-idempotent (e.g., OPEN, CREATE). If (e.g., PUTFH, LOOKUP), or non-idempotent (e.g., OPEN, CREATE). If
the origin is non-idempotent, then because the client MUST send the origin is non-idempotent, then because the client MUST send
GETFH after the origin operation, the client can recover if GETFH after the origin operation, the client can recover if
RESTOREFH returns an error. RESTOREFH returns an error.
o A server MAY return NFS4ERR_REP_TOO_BIG or * A server MAY return NFS4ERR_REP_TOO_BIG or
NFS4ERR_REP_TOO_BIG_TO_CACHE (if sa_cachethis is TRUE) on a NFS4ERR_REP_TOO_BIG_TO_CACHE (if sa_cachethis is TRUE) on a
filehandle-changing operation if the reply would be too large on filehandle-changing operation if the reply would be too large on
the next operation. the next operation.
o A server SHOULD return NFS4ERR_REP_TOO_BIG or * A server SHOULD return NFS4ERR_REP_TOO_BIG or
NFS4ERR_REP_TOO_BIG_TO_CACHE (if sa_cachethis is TRUE) on a NFS4ERR_REP_TOO_BIG_TO_CACHE (if sa_cachethis is TRUE) on a
filehandle-changing, non-idempotent operation if the reply would filehandle-changing, non-idempotent operation if the reply would
be too large on the next operation, especially if the operation is be too large on the next operation, especially if the operation is
OPEN. OPEN.
o A server MAY return NFS4ERR_UNSAFE_COMPOUND to a non-idempotent * A server MAY return NFS4ERR_UNSAFE_COMPOUND to a non-idempotent
current filehandle-changing operation, if it looks at the next current filehandle-changing operation, if it looks at the next
operation (in the same COMPOUND procedure) and finds it is not operation (in the same COMPOUND procedure) and finds it is not
GETFH. The server SHOULD do this if it is unable to determine in GETFH. The server SHOULD do this if it is unable to determine in
advance whether the total response size would exceed advance whether the total response size would exceed
ca_maxresponsesize_cached or ca_maxresponsesize. ca_maxresponsesize_cached or ca_maxresponsesize.
2.10.6.5. Persistence 2.10.6.5. Persistence
Since the reply cache is bounded, it is practical for the reply cache Since the reply cache is bounded, it is practical for the reply cache
to persist across server restarts. The replier MUST persist the to persist across server restarts. The replier MUST persist the
following information if it agreed to persist the session (when the following information if it agreed to persist the session (when the
session was created; see Section 18.36): session was created; see Section 18.36):
o The session ID. * The session ID.
o The slot table including the sequence ID and cached reply for each * The slot table including the sequence ID and cached reply for each
slot. slot.
The above are sufficient for a replier to provide EOS semantics for The above are sufficient for a replier to provide EOS semantics for
any requests that were sent and executed before the server restarted. any requests that were sent and executed before the server restarted.
If the replier is a client, then there is no need for it to persist If the replier is a client, then there is no need for it to persist
any more information, unless the client will be persisting all other any more information, unless the client will be persisting all other
state across client restart, in which case, the server will never see state across client restart, in which case, the server will never see
any NFSv4.1-level protocol manifestation of a client restart. If the any NFSv4.1-level protocol manifestation of a client restart. If the
replier is a server, with just the slot table and session ID replier is a server, with just the slot table and session ID
persisting, any requests the client retries after the server restart persisting, any requests the client retries after the server restart
will return the results that are cached in the reply cache, and any will return the results that are cached in the reply cache, and any
new requests (i.e., the sequence ID is one greater than the slot's new requests (i.e., the sequence ID is one greater than the slot's
sequence ID) MUST be rejected with NFS4ERR_DEADSESSION (returned by sequence ID) MUST be rejected with NFS4ERR_DEADSESSION (returned by
SEQUENCE). Such a session is considered dead. A server MAY re- SEQUENCE). Such a session is considered dead. A server MAY re-
animate a session after a server restart so that the session will animate a session after a server restart so that the session will
accept new requests as well as retries. To re-animate a session, the accept new requests as well as retries. To re-animate a session, the
server needs to persist additional information through server server needs to persist additional information through server
restart: restart:
o The client ID. This is a prerequisite to let the client create * The client ID. This is a prerequisite to let the client create
more sessions associated with the same client ID as the re- more sessions associated with the same client ID as the re-
animated session. animated session.
o The client ID's sequence ID that is used for creating sessions * The client ID's sequence ID that is used for creating sessions
(see Sections 18.35 and 18.36). This is a prerequisite to let the (see Sections 18.35 and 18.36). This is a prerequisite to let the
client create more sessions. client create more sessions.
o The principal that created the client ID. This allows the server * The principal that created the client ID. This allows the server
to authenticate the client when it sends EXCHANGE_ID. to authenticate the client when it sends EXCHANGE_ID.
o The SSV, if SP4_SSV state protection was specified when the client * The SSV, if SP4_SSV state protection was specified when the client
ID was created (see Section 18.35). This lets the client create ID was created (see Section 18.35). This lets the client create
new sessions, and associate connections with the new and existing new sessions, and associate connections with the new and existing
sessions. sessions.
o The properties of the client ID as defined in Section 18.35. * The properties of the client ID as defined in Section 18.35.
A persistent reply cache places certain demands on the server. The A persistent reply cache places certain demands on the server. The
execution of the sequence of operations (starting with SEQUENCE) and execution of the sequence of operations (starting with SEQUENCE) and
placement of its results in the persistent cache MUST be atomic. If placement of its results in the persistent cache MUST be atomic. If
a client retries a sequence of operations that was previously a client retries a sequence of operations that was previously
executed on the server, the only acceptable outcomes are either the executed on the server, the only acceptable outcomes are either the
original cached reply or an indication that the client ID or session original cached reply or an indication that the client ID or session
has been lost (indicating a catastrophic loss of the reply cache or a has been lost (indicating a catastrophic loss of the reply cache or a
session that has been deleted because the client failed to use the session that has been deleted because the client failed to use the
session for an extended period of time). session for an extended period of time).
skipping to change at page 65, line 39 skipping to change at line 3084
view the problem is as a single transaction consisting of each view the problem is as a single transaction consisting of each
operation in the COMPOUND followed by storing the result in operation in the COMPOUND followed by storing the result in
persistent storage, then finally a transaction commit. If there is a persistent storage, then finally a transaction commit. If there is a
failure before the transaction is committed, then the server rolls failure before the transaction is committed, then the server rolls
back the transaction. If the server itself fails, then when it back the transaction. If the server itself fails, then when it
restarts, its recovery logic could roll back the transaction before restarts, its recovery logic could roll back the transaction before
starting the NFSv4.1 server. starting the NFSv4.1 server.
While the description of the implementation for atomic execution of While the description of the implementation for atomic execution of
the request and caching of the reply is beyond the scope of this the request and caching of the reply is beyond the scope of this
document, an example implementation for NFSv2 [44] is described in document, an example implementation for NFSv2 [45] is described in
[45]. [46].
2.10.7. RDMA Considerations 2.10.7. RDMA Considerations
A complete discussion of the operation of RPC-based protocols over A complete discussion of the operation of RPC-based protocols over
RDMA transports is in [32]. A discussion of the operation of NFSv4, RDMA transports is in [32]. A discussion of the operation of NFSv4,
including NFSv4.1, over RDMA is in [33]. Where RDMA is considered, including NFSv4.1, over RDMA is in [33]. Where RDMA is considered,
this specification assumes the use of such a layering; it addresses this specification assumes the use of such a layering; it addresses
only the upper-layer issues relevant to making best use of RPC/RDMA. only the upper-layer issues relevant to making best use of RPC/RDMA.
2.10.7.1. RDMA Connection Resources 2.10.7.1. RDMA Connection Resources
skipping to change at page 69, line 7 skipping to change at line 3236
2.10.8.2. Backchannel RPC Security 2.10.8.2. Backchannel RPC Security
When the NFSv4.1 client establishes the backchannel, it informs the When the NFSv4.1 client establishes the backchannel, it informs the
server of the security flavors and principals to use when sending server of the security flavors and principals to use when sending
requests. If the security flavor is RPCSEC_GSS, the client expresses requests. If the security flavor is RPCSEC_GSS, the client expresses
the principal in the form of an established RPCSEC_GSS context. The the principal in the form of an established RPCSEC_GSS context. The
server is free to use any of the flavor/principal combinations the server is free to use any of the flavor/principal combinations the
client offers, but it MUST NOT use unoffered combinations. This way, client offers, but it MUST NOT use unoffered combinations. This way,
the client need not provide a target GSS principal for the the client need not provide a target GSS principal for the
backchannel as it did with NFSv4.0, nor does the server have to backchannel as it did with NFSv4.0, nor does the server have to
implement an RPCSEC_GSS initiator as it did with NFSv4.0 [36]. implement an RPCSEC_GSS initiator as it did with NFSv4.0 [37].
The CREATE_SESSION (Section 18.36) and BACKCHANNEL_CTL The CREATE_SESSION (Section 18.36) and BACKCHANNEL_CTL
(Section 18.33) operations allow the client to specify flavor/ (Section 18.33) operations allow the client to specify flavor/
principal combinations. principal combinations.
Also note that the SP4_SSV state protection mode (see Sections 18.35 Also note that the SP4_SSV state protection mode (see Sections 18.35
and 2.10.8.3) has the side benefit of providing SSV-derived and 2.10.8.3) has the side benefit of providing SSV-derived
RPCSEC_GSS contexts (Section 2.10.9). RPCSEC_GSS contexts (Section 2.10.9).
2.10.8.3. Protection from Unauthorized State Changes 2.10.8.3. Protection from Unauthorized State Changes
skipping to change at page 69, line 46 skipping to change at line 3275
NFSv4.1 provides three options to a client for state protection, NFSv4.1 provides three options to a client for state protection,
which are specified when a client creates a client ID via EXCHANGE_ID which are specified when a client creates a client ID via EXCHANGE_ID
(Section 18.35). (Section 18.35).
The first (SP4_NONE) is to simply waive state protection. The first (SP4_NONE) is to simply waive state protection.
The other two options (SP4_MACH_CRED and SP4_SSV) share several The other two options (SP4_MACH_CRED and SP4_SSV) share several
traits: traits:
o An RPCSEC_GSS-based credential is used to authenticate client ID * An RPCSEC_GSS-based credential is used to authenticate client ID
and session maintenance operations, including creating and and session maintenance operations, including creating and
destroying a session, associating a connection with the session, destroying a session, associating a connection with the session,
and destroying the client ID. and destroying the client ID.
o Because RPCSEC_GSS is used to authenticate client ID and session * Because RPCSEC_GSS is used to authenticate client ID and session
maintenance, the attacker cannot associate a rogue connection with maintenance, the attacker cannot associate a rogue connection with
a legitimate session, or associate a rogue session with a a legitimate session, or associate a rogue session with a
legitimate client ID in order to maliciously alter the client ID's legitimate client ID in order to maliciously alter the client ID's
lock state via CLOSE, LOCKU, DELEGRETURN, LAYOUTRETURN, etc. lock state via CLOSE, LOCKU, DELEGRETURN, LAYOUTRETURN, etc.
o In cases where the server's security policies on a portion of its * In cases where the server's security policies on a portion of its
namespace require RPCSEC_GSS authentication, a client may have to namespace require RPCSEC_GSS authentication, a client may have to
use an RPCSEC_GSS credential to remove per-file state (e.g., use an RPCSEC_GSS credential to remove per-file state (e.g.,
LOCKU, CLOSE, etc.). The server may require that the principal LOCKU, CLOSE, etc.). The server may require that the principal
that removes the state match certain criteria (e.g., the principal that removes the state match certain criteria (e.g., the principal
might have to be the same as the one that acquired the state). might have to be the same as the one that acquired the state).
However, the client might not have an RPCSEC_GSS context for such However, the client might not have an RPCSEC_GSS context for such
a principal, and might not be able to create such a context a principal, and might not be able to create such a context
(perhaps because the user has logged off). When the client (perhaps because the user has logged off). When the client
establishes SP4_MACH_CRED or SP4_SSV protection, it can specify a establishes SP4_MACH_CRED or SP4_SSV protection, it can specify a
list of operations that the server MUST allow using the machine list of operations that the server MUST allow using the machine
skipping to change at page 71, line 20 skipping to change at line 3343
situation comprised of a client that has multiple active users and a situation comprised of a client that has multiple active users and a
system administrator who wants to avoid the burden of installing a system administrator who wants to avoid the burden of installing a
permanent machine credential on each client. The SSV is established permanent machine credential on each client. The SSV is established
and updated on the server via SET_SSV (see Section 18.47). To and updated on the server via SET_SSV (see Section 18.47). To
prevent eavesdropping, a client SHOULD send SET_SSV via RPCSEC_GSS prevent eavesdropping, a client SHOULD send SET_SSV via RPCSEC_GSS
with the privacy service. Several aspects of the SSV make it with the privacy service. Several aspects of the SSV make it
intractable for an attacker to guess the SSV, and thus associate intractable for an attacker to guess the SSV, and thus associate
rogue connections with a session, and rogue sessions with a client rogue connections with a session, and rogue sessions with a client
ID: ID:
o The arguments to and results of SET_SSV include digests of the old * The arguments to and results of SET_SSV include digests of the old
and new SSV, respectively. and new SSV, respectively.
o Because the initial value of the SSV is zero, therefore known, the * Because the initial value of the SSV is zero, therefore known, the
client that opts for SP4_SSV protection and opts to apply SP4_SSV client that opts for SP4_SSV protection and opts to apply SP4_SSV
protection to BIND_CONN_TO_SESSION and CREATE_SESSION MUST send at protection to BIND_CONN_TO_SESSION and CREATE_SESSION MUST send at
least one SET_SSV operation before the first BIND_CONN_TO_SESSION least one SET_SSV operation before the first BIND_CONN_TO_SESSION
operation or before the second CREATE_SESSION operation on a operation or before the second CREATE_SESSION operation on a
client ID. If it does not, the SSV mechanism will not generate client ID. If it does not, the SSV mechanism will not generate
tokens (Section 2.10.9). A client SHOULD send SET_SSV as soon as tokens (Section 2.10.9). A client SHOULD send SET_SSV as soon as
a session is created. a session is created.
o A SET_SSV request does not replace the SSV with the argument to * A SET_SSV request does not replace the SSV with the argument to
SET_SSV. Instead, the current SSV on the server is logically SET_SSV. Instead, the current SSV on the server is logically
exclusive ORed (XORed) with the argument to SET_SSV. Each time a exclusive ORed (XORed) with the argument to SET_SSV. Each time a
new principal uses a client ID for the first time, the client new principal uses a client ID for the first time, the client
SHOULD send a SET_SSV with that principal's RPCSEC_GSS SHOULD send a SET_SSV with that principal's RPCSEC_GSS
credentials, with RPCSEC_GSS service set to RPC_GSS_SVC_PRIVACY. credentials, with RPCSEC_GSS service set to RPC_GSS_SVC_PRIVACY.
Here are the types of attacks that can be attempted by an attacker Here are the types of attacks that can be attempted by an attacker
named Eve on a victim named Bob, and how SP4_SSV protection foils named Eve on a victim named Bob, and how SP4_SSV protection foils
each attack: each attack:
o Suppose Eve is the first user to log into a legitimate client. * Suppose Eve is the first user to log into a legitimate client.
Eve's use of an NFSv4.1 file system will cause the legitimate Eve's use of an NFSv4.1 file system will cause the legitimate
client to create a client ID with SP4_SSV protection, specifying client to create a client ID with SP4_SSV protection, specifying
that the BIND_CONN_TO_SESSION operation MUST use the SSV that the BIND_CONN_TO_SESSION operation MUST use the SSV
credential. Eve's use of the file system also causes an SSV to be credential. Eve's use of the file system also causes an SSV to be
created. The SET_SSV operation that creates the SSV will be created. The SET_SSV operation that creates the SSV will be
protected by the RPCSEC_GSS context created by the legitimate protected by the RPCSEC_GSS context created by the legitimate
client, which uses Eve's GSS principal and credentials. Eve can client, which uses Eve's GSS principal and credentials. Eve can
eavesdrop on the network while her RPCSEC_GSS context is created eavesdrop on the network while her RPCSEC_GSS context is created
and the SET_SSV using her context is sent. Even if the legitimate and the SET_SSV using her context is sent. Even if the legitimate
client sends the SET_SSV with RPC_GSS_SVC_PRIVACY, because Eve client sends the SET_SSV with RPC_GSS_SVC_PRIVACY, because Eve
skipping to change at page 72, line 31 skipping to change at line 3402
the legitimate client, but she cannot disrupt Bob. Moreover, the legitimate client, but she cannot disrupt Bob. Moreover,
because the client SHOULD have modified the SSV due to Eve using because the client SHOULD have modified the SSV due to Eve using
the new session, Bob cannot get revenge on Eve by associating a the new session, Bob cannot get revenge on Eve by associating a
rogue connection with the session. rogue connection with the session.
The question is how did the legitimate client detect that Eve has The question is how did the legitimate client detect that Eve has
hijacked the old session? When the client detects that a new hijacked the old session? When the client detects that a new
principal, Bob, wants to use the session, it SHOULD have sent a principal, Bob, wants to use the session, it SHOULD have sent a
SET_SSV, which leads to the following sub-scenarios: SET_SSV, which leads to the following sub-scenarios:
* Let us suppose that from the rogue connection, Eve sent a - Let us suppose that from the rogue connection, Eve sent a
SET_SSV with the same slot ID and sequence ID that the SET_SSV with the same slot ID and sequence ID that the
legitimate client later uses. The server will assume the legitimate client later uses. The server will assume the
SET_SSV sent with Bob's credentials is a retry, and return to SET_SSV sent with Bob's credentials is a retry, and return to
the legitimate client the reply it sent Eve. However, unless the legitimate client the reply it sent Eve. However, unless
Eve can correctly guess the SSV the legitimate client will use, Eve can correctly guess the SSV the legitimate client will use,
the digest verification checks in the SET_SSV response will the digest verification checks in the SET_SSV response will
fail. That is an indication to the client that the session has fail. That is an indication to the client that the session has
apparently been hijacked. apparently been hijacked.
* Alternatively, Eve sent a SET_SSV with a different slot ID than - Alternatively, Eve sent a SET_SSV with a different slot ID than
the legitimate client uses for its SET_SSV. Then the digest the legitimate client uses for its SET_SSV. Then the digest
verification of the SET_SSV sent with Bob's credentials fails verification of the SET_SSV sent with Bob's credentials fails
on the server, and the error returned to the client makes it on the server, and the error returned to the client makes it
apparent that the session has been hijacked. apparent that the session has been hijacked.
* Alternatively, Eve sent an operation other than SET_SSV, but - Alternatively, Eve sent an operation other than SET_SSV, but
with the same slot ID and sequence that the legitimate client with the same slot ID and sequence that the legitimate client
uses for its SET_SSV. The server returns to the legitimate uses for its SET_SSV. The server returns to the legitimate
client the response it sent Eve. The client sees that the client the response it sent Eve. The client sees that the
response is not at all what it expects. The client assumes response is not at all what it expects. The client assumes
either session hijacking or a server bug, and either way either session hijacking or a server bug, and either way
destroys the old session. destroys the old session.
o Eve associates a rogue connection with the session as above, and * Eve associates a rogue connection with the session as above, and
then destroys the session. Again, Bob goes to use the server from then destroys the session. Again, Bob goes to use the server from
the legitimate client, which sends a SET_SSV using Bob's the legitimate client, which sends a SET_SSV using Bob's
credentials. The client receives an error that indicates that the credentials. The client receives an error that indicates that the
session does not exist. When the client tries to create a new session does not exist. When the client tries to create a new
session, this will fail because the SSV it has does not match that session, this will fail because the SSV it has does not match that
which the server has, and now the client knows the session was which the server has, and now the client knows the session was
hijacked. The legitimate client establishes a new client ID. hijacked. The legitimate client establishes a new client ID.
o If Eve creates a connection before the legitimate client * If Eve creates a connection before the legitimate client
establishes an SSV, because the initial value of the SSV is zero establishes an SSV, because the initial value of the SSV is zero
and therefore known, Eve can send a SET_SSV that will pass the and therefore known, Eve can send a SET_SSV that will pass the
digest verification check. However, because the new connection digest verification check. However, because the new connection
has not been associated with the session, the SET_SSV is rejected has not been associated with the session, the SET_SSV is rejected
for that reason. for that reason.
In summary, an attacker's disruption of state when SP4_SSV protection In summary, an attacker's disruption of state when SP4_SSV protection
is in use is limited to the formative period of a client ID, its is in use is limited to the formative period of a client ID, its
first session, and the establishment of the SSV. Once a non- first session, and the establishment of the SSV. Once a non-
malicious user uses the client ID, the client quickly detects any malicious user uses the client ID, the client quickly detects any
skipping to change at page 74, line 28 skipping to change at line 3484
iso.org.dod.internet.private.enterprise.Michael Eisler.nfs.ssv_mech iso.org.dod.internet.private.enterprise.Michael Eisler.nfs.ssv_mech
(1.3.6.1.4.1.28882.1.1). While the SSV mechanism does not define any (1.3.6.1.4.1.28882.1.1). While the SSV mechanism does not define any
initial context tokens, the OID can be used to let servers indicate initial context tokens, the OID can be used to let servers indicate
that the SSV mechanism is acceptable whenever the client sends a that the SSV mechanism is acceptable whenever the client sends a
SECINFO or SECINFO_NO_NAME operation (see Section 2.6). SECINFO or SECINFO_NO_NAME operation (see Section 2.6).
The SSV mechanism defines four subkeys derived from the SSV value. The SSV mechanism defines four subkeys derived from the SSV value.
Each time SET_SSV is invoked, the subkeys are recalculated by the Each time SET_SSV is invoked, the subkeys are recalculated by the
client and server. The calculation of each of the four subkeys client and server. The calculation of each of the four subkeys
depends on each of the four respective ssv_subkey4 enumerated values. depends on each of the four respective ssv_subkey4 enumerated values.
The calculation uses the HMAC [51] algorithm, using the current SSV The calculation uses the HMAC [52] algorithm, using the current SSV
as the key, the one-way hash algorithm as negotiated by EXCHANGE_ID, as the key, the one-way hash algorithm as negotiated by EXCHANGE_ID,
and the input text as represented by the XDR encoded enumeration and the input text as represented by the XDR encoded enumeration
value for that subkey of data type ssv_subkey4. If the length of the value for that subkey of data type ssv_subkey4. If the length of the
output of the HMAC algorithm exceeds the length of key of the output of the HMAC algorithm exceeds the length of key of the
encryption algorithm (which is also negotiated by EXCHANGE_ID), then encryption algorithm (which is also negotiated by EXCHANGE_ID), then
the subkey MUST be truncated from the HMAC output, i.e., if the the subkey MUST be truncated from the HMAC output, i.e., if the
subkey is of N bytes long, then the first N bytes of the HMAC output subkey is of N bytes long, then the first N bytes of the HMAC output
MUST be used for the subkey. The specification of EXCHANGE_ID states MUST be used for the subkey. The specification of EXCHANGE_ID states
that the length of the output of the HMAC algorithm MUST NOT be less that the length of the output of the HMAC algorithm MUST NOT be less
than the length of subkey needed for the encryption algorithm (see than the length of subkey needed for the encryption algorithm (see
skipping to change at page 78, line 23 skipping to change at line 3660
time, and the EXCHANGE_ID operation can be used to create more SSV time, and the EXCHANGE_ID operation can be used to create more SSV
RPCSEC_GSS handles. Expiration of SSV RPCSEC_GSS handles does not RPCSEC_GSS handles. Expiration of SSV RPCSEC_GSS handles does not
imply that the SSV or its GSS context has expired. imply that the SSV or its GSS context has expired.
The client MUST establish an SSV via SET_SSV before the SSV GSS The client MUST establish an SSV via SET_SSV before the SSV GSS
context can be used to emit tokens from GSS_Wrap() and GSS_GetMIC(). context can be used to emit tokens from GSS_Wrap() and GSS_GetMIC().
If SET_SSV has not been successfully called, attempts to emit tokens If SET_SSV has not been successfully called, attempts to emit tokens
MUST fail. MUST fail.
The SSV mechanism does not support replay detection and sequencing in The SSV mechanism does not support replay detection and sequencing in
its tokens because RPCSEC_GSS does not use those features (See its tokens because RPCSEC_GSS does not use those features (see
Section 5.2.2, "Context Creation Requests", in [4]). However, "Context Creation Requests", Section 5.2.2 of [4]). However,
Section 2.10.10 discusses special considerations for the SSV Section 2.10.10 discusses special considerations for the SSV
mechanism when used with RPCSEC_GSS. mechanism when used with RPCSEC_GSS.
2.10.10. Security Considerations for RPCSEC_GSS When Using the SSV 2.10.10. Security Considerations for RPCSEC_GSS When Using the SSV
Mechanism Mechanism
When a client ID is created with SP4_SSV state protection (see When a client ID is created with SP4_SSV state protection (see
Section 18.35), the client is permitted to associate multiple Section 18.35), the client is permitted to associate multiple
RPCSEC_GSS handles with the single SSV GSS context (see RPCSEC_GSS handles with the single SSV GSS context (see
Section 2.10.9). Because of the way RPCSEC_GSS (both version 1 and Section 2.10.9). Because of the way RPCSEC_GSS (both version 1 and
skipping to change at page 78, line 49 skipping to change at line 3686
value of the seq_num field of the RPCSEC_GSS credential (data type value of the seq_num field of the RPCSEC_GSS credential (data type
rpc_gss_cred_ver_1_t) (see Section 5.3.3.2 of [4]). If multiple rpc_gss_cred_ver_1_t) (see Section 5.3.3.2 of [4]). If multiple
RPCSEC_GSS handles share the same GSS context, then if one handle is RPCSEC_GSS handles share the same GSS context, then if one handle is
used to send a request with the same seq_num value as another handle, used to send a request with the same seq_num value as another handle,
an attacker could block the reply, and replace it with the verifier an attacker could block the reply, and replace it with the verifier
used for the other handle. used for the other handle.
There are multiple ways to prevent the attack on the SSV RPCSEC_GSS There are multiple ways to prevent the attack on the SSV RPCSEC_GSS
verifier in the reply. The simplest is believed to be as follows. verifier in the reply. The simplest is believed to be as follows.
o Each time one or more new SSV RPCSEC_GSS handles are created via * Each time one or more new SSV RPCSEC_GSS handles are created via
EXCHANGE_ID, the client SHOULD send a SET_SSV operation to modify EXCHANGE_ID, the client SHOULD send a SET_SSV operation to modify
the SSV. By changing the SSV, the new handles will not result in the SSV. By changing the SSV, the new handles will not result in
the re-use of an SSV RPCSEC_GSS verifier in a reply. the re-use of an SSV RPCSEC_GSS verifier in a reply.
o When a requester decides to use N SSV RPCSEC_GSS handles, it * When a requester decides to use N SSV RPCSEC_GSS handles, it
SHOULD assign a unique and non-overlapping range of seq_nums to SHOULD assign a unique and non-overlapping range of seq_nums to
each SSV RPCSEC_GSS handle. The size of each range SHOULD be each SSV RPCSEC_GSS handle. The size of each range SHOULD be
equal to MAXSEQ / N (see Section 5 of [4] for the definition of equal to MAXSEQ / N (see Section 5 of [4] for the definition of
MAXSEQ). When an SSV RPCSEC_GSS handle reaches its maximum, it MAXSEQ). When an SSV RPCSEC_GSS handle reaches its maximum, it
SHOULD force the replier to destroy the handle by sending a NULL SHOULD force the replier to destroy the handle by sending a NULL
RPC request with seq_num set to MAXSEQ + 1 (see Section 5.3.3.3 of RPC request with seq_num set to MAXSEQ + 1 (see Section 5.3.3.3 of
[4]). [4]).
o When the requester wants to increase or decrease N, it SHOULD * When the requester wants to increase or decrease N, it SHOULD
force the replier to destroy all N handles by sending a NULL RPC force the replier to destroy all N handles by sending a NULL RPC
request on each handle with seq_num set to MAXSEQ + 1. If the request on each handle with seq_num set to MAXSEQ + 1. If the
requester is the client, it SHOULD send a SET_SSV operation before requester is the client, it SHOULD send a SET_SSV operation before
using new handles. If the requester is the server, then the using new handles. If the requester is the server, then the
client SHOULD send a SET_SSV operation when it detects that the client SHOULD send a SET_SSV operation when it detects that the
server has forced it to destroy a backchannel's SSV RPCSEC_GSS server has forced it to destroy a backchannel's SSV RPCSEC_GSS
handle. By sending a SET_SSV operation, the SSV will change, and handle. By sending a SET_SSV operation, the SSV will change, and
so the attacker will be unavailable to successfully replay a so the attacker will be unavailable to successfully replay a
previous verifier in a reply to the requester. previous verifier in a reply to the requester.
skipping to change at page 80, line 10 skipping to change at line 3737
backchannel resources that the client has created for the server backchannel resources that the client has created for the server
(RPCSEC_GSS contexts and backchannel connections). If these (RPCSEC_GSS contexts and backchannel connections). If these
resources vanish, the server takes action as specified in resources vanish, the server takes action as specified in
Section 2.10.13.2. Section 2.10.13.2.
2.10.11.2. Obligations of the Client 2.10.11.2. Obligations of the Client
The client SHOULD honor the following obligations in order to utilize The client SHOULD honor the following obligations in order to utilize
the session: the session:
o Keep a necessary session from going idle on the server. A client * Keep a necessary session from going idle on the server. A client
that requires a session but nonetheless is not sending operations that requires a session but nonetheless is not sending operations
risks having the session be destroyed by the server. This is risks having the session be destroyed by the server. This is
because sessions consume resources, and resource limitations may because sessions consume resources, and resource limitations may
force the server to cull an inactive session. A server MAY force the server to cull an inactive session. A server MAY
consider a session to be inactive if the client has not used the consider a session to be inactive if the client has not used the
session before the session inactivity timer (Section 2.10.12) has session before the session inactivity timer (Section 2.10.12) has
expired. expired.
o Destroy the session when not needed. If a client has multiple * Destroy the session when not needed. If a client has multiple
sessions, one of which has no requests waiting for replies, and sessions, one of which has no requests waiting for replies, and
has been idle for some period of time, it SHOULD destroy the has been idle for some period of time, it SHOULD destroy the
session. session.
o Maintain GSS contexts and RPCSEC_GSS handles for the backchannel. * Maintain GSS contexts and RPCSEC_GSS handles for the backchannel.
If the client requires the server to use the RPCSEC_GSS security If the client requires the server to use the RPCSEC_GSS security
flavor for callbacks, then it needs to be sure the RPCSEC_GSS flavor for callbacks, then it needs to be sure the RPCSEC_GSS
handles and/or their GSS contexts that are handed to the server handles and/or their GSS contexts that are handed to the server
via BACKCHANNEL_CTL or CREATE_SESSION are unexpired. via BACKCHANNEL_CTL or CREATE_SESSION are unexpired.
o Preserve a connection for a backchannel. The server requires a * Preserve a connection for a backchannel. The server requires a
backchannel in order to gracefully recall recallable state or backchannel in order to gracefully recall recallable state or
notify the client of certain events. Note that if the connection notify the client of certain events. Note that if the connection
is not being used for the fore channel, there is no way for the is not being used for the fore channel, there is no way for the
client to tell if the connection is still alive (e.g., the server client to tell if the connection is still alive (e.g., the server
restarted without sending a disconnect). The onus is on the restarted without sending a disconnect). The onus is on the
server, not the client, to determine if the backchannel's server, not the client, to determine if the backchannel's
connection is alive, and to indicate in the response to a SEQUENCE connection is alive, and to indicate in the response to a SEQUENCE
operation when the last connection associated with a session's operation when the last connection associated with a session's
backchannel has disconnected. backchannel has disconnected.
skipping to change at page 83, line 9 skipping to change at line 3878
means, the client will learn if some or all of the RPCSEC_GSS means, the client will learn if some or all of the RPCSEC_GSS
contexts it assigned to the backchannel have been lost. If the contexts it assigned to the backchannel have been lost. If the
client wants to retain the backchannel and/or not put recallable client wants to retain the backchannel and/or not put recallable
state subject to revocation, the client needs to use BACKCHANNEL_CTL state subject to revocation, the client needs to use BACKCHANNEL_CTL
to assign new contexts. to assign new contexts.
2.10.13.1.4. Loss of Session 2.10.13.1.4. Loss of Session
The replier might lose a record of the session. Causes include: The replier might lose a record of the session. Causes include:
o Replier failure and restart. * Replier failure and restart.
o A catastrophe that causes the reply cache to be corrupted or lost * A catastrophe that causes the reply cache to be corrupted or lost
on the media on which it was stored. This applies even if the on the media on which it was stored. This applies even if the
replier indicated in the CREATE_SESSION results that it would replier indicated in the CREATE_SESSION results that it would
persist the cache. persist the cache.
o The server purges the session of a client that has been inactive * The server purges the session of a client that has been inactive
for a very extended period of time. for a very extended period of time.
o As a result of configuration changes among a set of clustered * As a result of configuration changes among a set of clustered
servers, a network address previously connected to one server servers, a network address previously connected to one server
becomes connected to a different server that has no knowledge of becomes connected to a different server that has no knowledge of
the session in question. Such a configuration change will the session in question. Such a configuration change will
generally only happen when the original server ceases to function generally only happen when the original server ceases to function
for a time. for a time.
Loss of reply cache is equivalent to loss of session. The replier Loss of reply cache is equivalent to loss of session. The replier
indicates loss of session to the requester by returning indicates loss of session to the requester by returning
NFS4ERR_BADSESSION on the next operation that uses the session ID NFS4ERR_BADSESSION on the next operation that uses the session ID
that refers to the lost session. that refers to the lost session.
skipping to change at page 84, line 9 skipping to change at line 3927
1. If the client has other connections to other server network 1. If the client has other connections to other server network
addresses associated with the same session, attempt a COMPOUND addresses associated with the same session, attempt a COMPOUND
with a single operation, SEQUENCE, on each of the other with a single operation, SEQUENCE, on each of the other
connections. connections.
2. If the attempts succeed, the session is still alive, and this is 2. If the attempts succeed, the session is still alive, and this is
a strong indicator that the server's network address has moved. a strong indicator that the server's network address has moved.
The client might send an EXCHANGE_ID on the connection that The client might send an EXCHANGE_ID on the connection that
returned NFS4ERR_BADSESSION to see if there are opportunities for returned NFS4ERR_BADSESSION to see if there are opportunities for
client ID trunking (i.e., the same client ID and so_major value client ID trunking (i.e., the same client ID and so_major_id
are returned). The client might use DNS to see if the moved value are returned). The client might use DNS to see if the
network address was replaced with another, so that the moved network address was replaced with another, so that the
performance and availability benefits of session trunking can performance and availability benefits of session trunking can
continue. continue.
3. If the SEQUENCE requests fail with NFS4ERR_BADSESSION, then the 3. If the SEQUENCE requests fail with NFS4ERR_BADSESSION, then the
session no longer exists on any of the server network addresses session no longer exists on any of the server network addresses
for which the client has connections associated with that session for which the client has connections associated with that session
ID. It is possible the session is still alive and available on ID. It is possible the session is still alive and available on
other network addresses. The client sends an EXCHANGE_ID on all other network addresses. The client sends an EXCHANGE_ID on all
the connections to see if the server owner is still listening on the connections to see if the server owner is still listening on
those network addresses. If the same server owner is returned those network addresses. If the same server owner is returned
skipping to change at page 87, line 20 skipping to change at line 4079
EXCHGID4_FLAG_USE_PNFS_MDS, and EXCHGID4_FLAG_USE_PNFS_DS flags (not EXCHGID4_FLAG_USE_PNFS_MDS, and EXCHGID4_FLAG_USE_PNFS_DS flags (not
mutually exclusive) are passed in the EXCHANGE_ID arguments and mutually exclusive) are passed in the EXCHANGE_ID arguments and
results to allow the client to indicate how it wants to use sessions results to allow the client to indicate how it wants to use sessions
created under the client ID, and to allow the server to indicate how created under the client ID, and to allow the server to indicate how
it will allow the sessions to be used. See Section 13.1 for pNFS it will allow the sessions to be used. See Section 13.1 for pNFS
sessions considerations. sessions considerations.
3. Protocol Constants and Data Types 3. Protocol Constants and Data Types
The syntax and semantics to describe the data types of the NFSv4.1 The syntax and semantics to describe the data types of the NFSv4.1
protocol are defined in the XDR RFC 4506 [2] and RPC RFC 5531 [3] protocol are defined in the XDR (RFC 4506 [2]) and RPC (RFC 5531 [3])
documents. The next sections build upon the XDR data types to define documents. The next sections build upon the XDR data types to define
constants, types, and structures specific to this protocol. The full constants, types, and structures specific to this protocol. The full
list of XDR data types is in [10]. list of XDR data types is in [10].
3.1. Basic Constants 3.1. Basic Constants
const NFS4_FHSIZE = 128; const NFS4_FHSIZE = 128;
const NFS4_VERIFIER_SIZE = 8; const NFS4_VERIFIER_SIZE = 8;
const NFS4_OPAQUE_LIMIT = 1024; const NFS4_OPAQUE_LIMIT = 1024;
const NFS4_SESSIONID_SIZE = 16; const NFS4_SESSIONID_SIZE = 16;
skipping to change at page 87, line 42 skipping to change at line 4101
const NFS4_INT64_MAX = 0x7fffffffffffffff; const NFS4_INT64_MAX = 0x7fffffffffffffff;
const NFS4_UINT64_MAX = 0xffffffffffffffff; const NFS4_UINT64_MAX = 0xffffffffffffffff;
const NFS4_INT32_MAX = 0x7fffffff; const NFS4_INT32_MAX = 0x7fffffff;
const NFS4_UINT32_MAX = 0xffffffff; const NFS4_UINT32_MAX = 0xffffffff;
const NFS4_MAXFILELEN = 0xffffffffffffffff; const NFS4_MAXFILELEN = 0xffffffffffffffff;
const NFS4_MAXFILEOFF = 0xfffffffffffffffe; const NFS4_MAXFILEOFF = 0xfffffffffffffffe;
Except where noted, all these constants are defined in bytes. Except where noted, all these constants are defined in bytes.
o NFS4_FHSIZE is the maximum size of a filehandle. * NFS4_FHSIZE is the maximum size of a filehandle.
o NFS4_VERIFIER_SIZE is the fixed size of a verifier. * NFS4_VERIFIER_SIZE is the fixed size of a verifier.
o NFS4_OPAQUE_LIMIT is the maximum size of certain opaque * NFS4_OPAQUE_LIMIT is the maximum size of certain opaque
information. information.
o NFS4_SESSIONID_SIZE is the fixed size of a session identifier. * NFS4_SESSIONID_SIZE is the fixed size of a session identifier.
o NFS4_INT64_MAX is the maximum value of a signed 64-bit integer. * NFS4_INT64_MAX is the maximum value of a signed 64-bit integer.
o NFS4_UINT64_MAX is the maximum value of an unsigned 64-bit * NFS4_UINT64_MAX is the maximum value of an unsigned 64-bit
integer. integer.
o NFS4_INT32_MAX is the maximum value of a signed 32-bit integer. * NFS4_INT32_MAX is the maximum value of a signed 32-bit integer.
o NFS4_UINT32_MAX is the maximum value of an unsigned 32-bit * NFS4_UINT32_MAX is the maximum value of an unsigned 32-bit
integer. integer.
o NFS4_MAXFILELEN is the maximum length of a regular file. * NFS4_MAXFILELEN is the maximum length of a regular file.
o NFS4_MAXFILEOFF is the maximum offset into a regular file. * NFS4_MAXFILEOFF is the maximum offset into a regular file.
3.2. Basic Data Types 3.2. Basic Data Types
These are the base NFSv4.1 data types. These are the base NFSv4.1 data types.
+---------------+---------------------------------------------------+ +===============+==============================================+
| Data Type | Definition | | Data Type | Definition |
+---------------+---------------------------------------------------+ +===============+==============================================+
| int32_t | typedef int int32_t; | | int32_t | typedef int int32_t; |
| uint32_t | typedef unsigned int uint32_t; | +---------------+----------------------------------------------+
| int64_t | typedef hyper int64_t; | | uint32_t | typedef unsigned int uint32_t; |
| uint64_t | typedef unsigned hyper uint64_t; | +---------------+----------------------------------------------+
| attrlist4 | typedef opaque attrlist4<>; | | int64_t | typedef hyper int64_t; |
| | Used for file/directory attributes. | +---------------+----------------------------------------------+
| bitmap4 | typedef uint32_t bitmap4<>; | | uint64_t | typedef unsigned hyper uint64_t; |
| | Used in attribute array encoding. | +---------------+----------------------------------------------+
| changeid4 | typedef uint64_t changeid4; | | attrlist4 | typedef opaque attrlist4<>; |
| | Used in the definition of change_info4. | | | |
| clientid4 | typedef uint64_t clientid4; | | | Used for file/directory attributes. |
| | Shorthand reference to client identification. | +---------------+----------------------------------------------+
| count4 | typedef uint32_t count4; | | bitmap4 | typedef uint32_t bitmap4<>; |
| | Various count parameters (READ, WRITE, COMMIT). | | | |
| length4 | typedef uint64_t length4; | | | Used in attribute array encoding. |
| | The length of a byte-range within a file. | +---------------+----------------------------------------------+
| mode4 | typedef uint32_t mode4; | | changeid4 | typedef uint64_t changeid4; |
| | Mode attribute data type. | | | |
| nfs_cookie4 | typedef uint64_t nfs_cookie4; | | | Used in the definition of change_info4. |
| | Opaque cookie value for READDIR. | +---------------+----------------------------------------------+
| nfs_fh4 | typedef opaque nfs_fh4<NFS4_FHSIZE>; | | clientid4 | typedef uint64_t clientid4; |
| | Filehandle definition. | | | |
| nfs_ftype4 | enum nfs_ftype4; | | | Shorthand reference to client |
| | Various defined file types. | | | identification. |
| nfsstat4 | enum nfsstat4; | +---------------+----------------------------------------------+
| | Return value for operations. | | count4 | typedef uint32_t count4; |
| offset4 | typedef uint64_t offset4; | | | |
| | Various offset designations (READ, WRITE, LOCK, | | | Various count parameters (READ, WRITE, |
| | COMMIT). | | | COMMIT). |
| qop4 | typedef uint32_t qop4; | +---------------+----------------------------------------------+
| | Quality of protection designation in SECINFO. | | length4 | typedef uint64_t length4; |
| sec_oid4 | typedef opaque sec_oid4<>; | | | |
| | Security Object Identifier. The sec_oid4 data | | | The length of a byte-range within a file. |
| | type is not really opaque. Instead, it contains | +---------------+----------------------------------------------+
| | an ASN.1 OBJECT IDENTIFIER as used by GSS-API in | | mode4 | typedef uint32_t mode4; |
| | the mech_type argument to GSS_Init_sec_context. | | | |
| | See [7] for details. | | | Mode attribute data type. |
| sequenceid4 | typedef uint32_t sequenceid4; | +---------------+----------------------------------------------+
| | Sequence number used for various session | | nfs_cookie4 | typedef uint64_t nfs_cookie4; |
| | operations (EXCHANGE_ID, CREATE_SESSION, | | | |
| | SEQUENCE, CB_SEQUENCE). | | | Opaque cookie value for READDIR. |
| seqid4 | typedef uint32_t seqid4; | +---------------+----------------------------------------------+
| | Sequence identifier used for locking. | | nfs_fh4 | typedef opaque nfs_fh4<NFS4_FHSIZE>; |
| sessionid4 | typedef opaque sessionid4[NFS4_SESSIONID_SIZE]; | | | |
| | Session identifier. | | | Filehandle definition. |
| slotid4 | typedef uint32_t slotid4; | +---------------+----------------------------------------------+
| | Sequencing artifact for various session | | nfs_ftype4 | enum nfs_ftype4; |
| | operations (SEQUENCE, CB_SEQUENCE). | | | |
| utf8string | typedef opaque utf8string<>; | | | Various defined file types. |
| | UTF-8 encoding for strings. | +---------------+----------------------------------------------+
| utf8str_cis | typedef utf8string utf8str_cis; | | nfsstat4 | enum nfsstat4; |
| | Case-insensitive UTF-8 string. | | | |
| utf8str_cs | typedef utf8string utf8str_cs; | | | Return value for operations. |
| | Case-sensitive UTF-8 string. | +---------------+----------------------------------------------+
| utf8str_mixed | typedef utf8string utf8str_mixed; | | offset4 | typedef uint64_t offset4; |
| | UTF-8 strings with a case-sensitive prefix and a | | | |
| | case-insensitive suffix. | | | Various offset designations (READ, WRITE, |
| component4 | typedef utf8str_cs component4; | | | LOCK, COMMIT). |
| | Represents pathname components. | +---------------+----------------------------------------------+
| linktext4 | typedef utf8str_cs linktext4; | | qop4 | typedef uint32_t qop4; |
| | Symbolic link contents ("symbolic link" is | | | |
| | defined in an Open Group [11] standard). | | | Quality of protection designation in |
| pathname4 | typedef component4 pathname4<>; | | | SECINFO. |
| | Represents pathname for fs_locations. | +---------------+----------------------------------------------+
| verifier4 | typedef opaque verifier4[NFS4_VERIFIER_SIZE]; | | sec_oid4 | typedef opaque sec_oid4<>; |
| | Verifier used for various operations (COMMIT, | | | |
| | CREATE, EXCHANGE_ID, OPEN, READDIR, WRITE) | | | Security Object Identifier. The sec_oid4 |
| | NFS4_VERIFIER_SIZE is defined as 8. | | | data type is not really opaque. Instead, it |
+---------------+---------------------------------------------------+ | | contains an ASN.1 OBJECT IDENTIFIER as used |
| | by GSS-API in the mech_type argument to |
| | GSS_Init_sec_context. See [7] for details. |
+---------------+----------------------------------------------+
| sequenceid4 | typedef uint32_t sequenceid4; |
| | |
| | Sequence number used for various session |
| | operations (EXCHANGE_ID, CREATE_SESSION, |
| | SEQUENCE, CB_SEQUENCE). |
+---------------+----------------------------------------------+
| seqid4 | typedef uint32_t seqid4; |
| | |
| | Sequence identifier used for locking. |
+---------------+----------------------------------------------+
| sessionid4 | typedef opaque |
| | sessionid4[NFS4_SESSIONID_SIZE]; |
| | |
| | Session identifier. |
+---------------+----------------------------------------------+
| slotid4 | typedef uint32_t slotid4; |
| | |
| | Sequencing artifact for various session |
| | operations (SEQUENCE, CB_SEQUENCE). |
+---------------+----------------------------------------------+
| utf8string | typedef opaque utf8string<>; |
| | |
| | UTF-8 encoding for strings. |
+---------------+----------------------------------------------+
| utf8str_cis | typedef utf8string utf8str_cis; |
| | |
| | Case-insensitive UTF-8 string. |
+---------------+----------------------------------------------+
| utf8str_cs | typedef utf8string utf8str_cs; |
| | |
| | Case-sensitive UTF-8 string. |
+---------------+----------------------------------------------+
| utf8str_mixed | typedef utf8string utf8str_mixed; |
| | |
| | UTF-8 strings with a case-sensitive prefix |
| | and a case-insensitive suffix. |
+---------------+----------------------------------------------+
| component4 | typedef utf8str_cs component4; |
| | |
| | Represents pathname components. |
+---------------+----------------------------------------------+
| linktext4 | typedef utf8str_cs linktext4; |
| | |
| | Symbolic link contents ("symbolic link" is |
| | defined in an Open Group [11] standard). |
+---------------+----------------------------------------------+
| pathname4 | typedef component4 pathname4<>; |
| | |
| | Represents pathname for fs_locations. |
+---------------+----------------------------------------------+
| verifier4 | typedef opaque |
| | verifier4[NFS4_VERIFIER_SIZE]; |
| | |
| | Verifier used for various operations |
| | (COMMIT, CREATE, EXCHANGE_ID, OPEN, READDIR, |
| | WRITE) NFS4_VERIFIER_SIZE is defined as 8. |
+---------------+----------------------------------------------+
End of Base Data Types Table 1
Table 1 End of Base Data Types
3.3. Structured Data Types 3.3. Structured Data Types
3.3.1. nfstime4 3.3.1. nfstime4
struct nfstime4 { struct nfstime4 {
int64_t seconds; int64_t seconds;
uint32_t nseconds; uint32_t nseconds;
}; };
skipping to change at page 92, line 5 skipping to change at line 4360
}; };
The fattr4 data type is used to represent file and directory The fattr4 data type is used to represent file and directory
attributes. attributes.
The bitmap is a counted array of 32-bit integers used to contain bit The bitmap is a counted array of 32-bit integers used to contain bit
values. The position of the integer in the array that contains bit n values. The position of the integer in the array that contains bit n
can be computed from the expression (n / 32), and its bit within that can be computed from the expression (n / 32), and its bit within that
integer is (n mod 32). integer is (n mod 32).
0 1 0 1
+-----------+-----------+-----------+-- +-----------+-----------+-----------+--
| count | 31 .. 0 | 63 .. 32 | | count | 31 .. 0 | 63 .. 32 |
+-----------+-----------+-----------+-- +-----------+-----------+-----------+--
3.3.8. change_info4 3.3.8. change_info4
struct change_info4 { struct change_info4 {
bool atomic; bool atomic;
changeid4 before; changeid4 before;
changeid4 after; changeid4 after;
skipping to change at page 92, line 35 skipping to change at line 4390
struct netaddr4 { struct netaddr4 {
/* see struct rpcb in RFC 1833 */ /* see struct rpcb in RFC 1833 */
string na_r_netid<>; /* network id */ string na_r_netid<>; /* network id */
string na_r_addr<>; /* universal address */ string na_r_addr<>; /* universal address */
}; };
The netaddr4 data type is used to identify network transport The netaddr4 data type is used to identify network transport
endpoints. The na_r_netid and na_r_addr fields respectively contain endpoints. The na_r_netid and na_r_addr fields respectively contain
a netid and uaddr. The netid and uaddr concepts are defined in [12]. a netid and uaddr. The netid and uaddr concepts are defined in [12].
The netid and uaddr formats for TCP over IPv4 and TCP over IPv6 are The netid and uaddr formats for TCP over IPv4 and TCP over IPv6 are
defined in [12], specifically Tables 2 and 3 and Sections 5.2.3.3 and defined in [12], specifically Tables 2 and 3 and in Sections 5.2.3.3
5.2.3.4. and 5.2.3.4.
3.3.10. state_owner4 3.3.10. state_owner4
struct state_owner4 { struct state_owner4 {
clientid4 clientid; clientid4 clientid;
opaque owner<NFS4_OPAQUE_LIMIT>; opaque owner<NFS4_OPAQUE_LIMIT>;
}; };
typedef state_owner4 open_owner4; typedef state_owner4 open_owner4;
typedef state_owner4 lock_owner4; typedef state_owner4 lock_owner4;
skipping to change at page 94, line 20 skipping to change at line 4470
The layouttype4 data type is 32 bits in length. The range The layouttype4 data type is 32 bits in length. The range
represented by the layout type is split into three parts. Type 0x0 represented by the layout type is split into three parts. Type 0x0
is reserved. Types within the range 0x00000001-0x7FFFFFFF are is reserved. Types within the range 0x00000001-0x7FFFFFFF are
globally unique and are assigned according to the description in globally unique and are assigned according to the description in
Section 22.5; they are maintained by IANA. Types within the range Section 22.5; they are maintained by IANA. Types within the range
0x80000000-0xFFFFFFFF are site specific and for private use only. 0x80000000-0xFFFFFFFF are site specific and for private use only.
The LAYOUT4_NFSV4_1_FILES enumeration specifies that the NFSv4.1 file The LAYOUT4_NFSV4_1_FILES enumeration specifies that the NFSv4.1 file
layout type, as defined in Section 13, is to be used. The layout type, as defined in Section 13, is to be used. The
LAYOUT4_OSD2_OBJECTS enumeration specifies that the object layout, as LAYOUT4_OSD2_OBJECTS enumeration specifies that the object layout, as
defined in [46], is to be used. Similarly, the LAYOUT4_BLOCK_VOLUME defined in [47], is to be used. Similarly, the LAYOUT4_BLOCK_VOLUME
enumeration specifies that the block/volume layout, as defined in enumeration specifies that the block/volume layout, as defined in
[47], is to be used. [48], is to be used.
3.3.14. deviceid4 3.3.14. deviceid4
const NFS4_DEVICEID4_SIZE = 16; const NFS4_DEVICEID4_SIZE = 16;
typedef opaque deviceid4[NFS4_DEVICEID4_SIZE]; typedef opaque deviceid4[NFS4_DEVICEID4_SIZE];
Layout information includes device IDs that specify a storage device Layout information includes device IDs that specify a storage device
through a compact handle. Addressing and type information is through a compact handle. Addressing and type information is
obtained with the GETDEVICEINFO operation. Device IDs are not obtained with the GETDEVICEINFO operation. Device IDs are not
skipping to change at page 98, line 5 skipping to change at line 4624
helping the client determine when it should send I/O directly through helping the client determine when it should send I/O directly through
the metadata server versus the storage devices. The data type the metadata server versus the storage devices. The data type
consists of the layout type (thi_layout_type), a bitmap (thi_hintset) consists of the layout type (thi_layout_type), a bitmap (thi_hintset)
describing the set of hints supported by the server (they may differ describing the set of hints supported by the server (they may differ
based on the layout type), and a list of hints (thi_hintlist) whose based on the layout type), and a list of hints (thi_hintlist) whose
content is determined by the hintset bitmap. See the mdsthreshold content is determined by the hintset bitmap. See the mdsthreshold
attribute for more details. attribute for more details.
The thi_hintset field is a bitmap of the following values: The thi_hintset field is a bitmap of the following values:
+-------------------------+---+---------+---------------------------+ +=========================+===+=========+===========================+
| name | # | Data | Description | | name | # | Data | Description |
| | | Type | | | | | Type | |
+-------------------------+---+---------+---------------------------+ +=========================+===+=========+===========================+
| threshold4_read_size | 0 | length4 | If a file's length is | | threshold4_read_size | 0 | length4 | If a file's length is |
| | | | less than the value of | | | | | less than the value of |
| | | | threshold4_read_size, | | | | | threshold4_read_size, |
| | | | then it is RECOMMENDED | | | | | then it is RECOMMENDED |
| | | | that the client read from | | | | | that the client read |
| | | | the file via the MDS and | | | | | from the file via the |
| | | | not a storage device. | | | | | MDS and not a storage |
| | | | device. |
+-------------------------+---+---------+---------------------------+
| threshold4_write_size | 1 | length4 | If a file's length is | | threshold4_write_size | 1 | length4 | If a file's length is |
| | | | less than the value of | | | | | less than the value of |
| | | | threshold4_write_size, | | | | | threshold4_write_size, |
| | | | then it is RECOMMENDED | | | | | then it is RECOMMENDED |
| | | | that the client write to | | | | | that the client write |
| | | | the file via the MDS and | | | | | to the file via the |
| | | | not a storage device. | | | | | MDS and not a storage |
| threshold4_read_iosize | 2 | length4 | For read I/O sizes below | | | | | device. |
| | | | this threshold, it is | +-------------------------+---+---------+---------------------------+
| | | | RECOMMENDED to read data | | threshold4_read_iosize | 2 | length4 | For read I/O sizes |
| | | | through the MDS. | | | | | below this threshold, |
| threshold4_write_iosize | 3 | length4 | For write I/O sizes below | | | | | it is RECOMMENDED to |
| | | | this threshold, it is | | | | | read data through the |
| | | | RECOMMENDED to write data | | | | | MDS. |
| | | | through the MDS. |
+-------------------------+---+---------+---------------------------+ +-------------------------+---+---------+---------------------------+
| threshold4_write_iosize | 3 | length4 | For write I/O sizes |
| | | | below this threshold, |
| | | | it is RECOMMENDED to |
| | | | write data through the |
| | | | MDS. |
+-------------------------+---+---------+---------------------------+
Table 2
3.3.23. mdsthreshold4 3.3.23. mdsthreshold4
struct mdsthreshold4 { struct mdsthreshold4 {
threshold_item4 mth_hints<>; threshold_item4 mth_hints<>;
}; };
This data type holds an array of elements of data type This data type holds an array of elements of data type
threshold_item4, each of which is valid for a particular layout type. threshold_item4, each of which is valid for a particular layout type.
An array is necessary because a server can support multiple layout An array is necessary because a server can support multiple layout
skipping to change at page 99, line 10 skipping to change at line 4685
for a file system object. The contents of the filehandle are opaque for a file system object. The contents of the filehandle are opaque
to the client. Therefore, the server is responsible for translating to the client. Therefore, the server is responsible for translating
the filehandle to an internal representation of the file system the filehandle to an internal representation of the file system
object. object.
4.1. Obtaining the First Filehandle 4.1. Obtaining the First Filehandle
The operations of the NFS protocol are defined in terms of one or The operations of the NFS protocol are defined in terms of one or
more filehandles. Therefore, the client needs a filehandle to more filehandles. Therefore, the client needs a filehandle to
initiate communication with the server. With the NFSv3 protocol (RFC initiate communication with the server. With the NFSv3 protocol (RFC
1813 [37]), there exists an ancillary protocol to obtain this first 1813 [38]), there exists an ancillary protocol to obtain this first
filehandle. The MOUNT protocol, RPC program number 100005, provides filehandle. The MOUNT protocol, RPC program number 100005, provides
the mechanism of translating a string-based file system pathname to a the mechanism of translating a string-based file system pathname to a
filehandle, which can then be used by the NFS protocols. filehandle, which can then be used by the NFS protocols.
The MOUNT protocol has deficiencies in the area of security and use The MOUNT protocol has deficiencies in the area of security and use
via firewalls. This is one reason that the use of the public via firewalls. This is one reason that the use of the public
filehandle was introduced in RFC 2054 [48] and RFC 2055 [49]. With filehandle was introduced in RFC 2054 [49] and RFC 2055 [50]. With
the use of the public filehandle in combination with the LOOKUP the use of the public filehandle in combination with the LOOKUP
operation in the NFSv3 protocol, it has been demonstrated that the operation in the NFSv3 protocol, it has been demonstrated that the
MOUNT protocol is unnecessary for viable interaction between NFS MOUNT protocol is unnecessary for viable interaction between NFS
client and server. client and server.
Therefore, the NFSv4.1 protocol will not use an ancillary protocol Therefore, the NFSv4.1 protocol will not use an ancillary protocol
for translation from string-based pathnames to a filehandle. Two for translation from string-based pathnames to a filehandle. Two
special filehandles will be used as starting points for the NFS special filehandles will be used as starting points for the NFS
client. client.
skipping to change at page 103, line 6 skipping to change at line 4867
Volatile filehandles are especially suitable for implementation of Volatile filehandles are especially suitable for implementation of
the pseudo file systems used to bridge exports. See Section 7.5 for the pseudo file systems used to bridge exports. See Section 7.5 for
a discussion of this. a discussion of this.
4.3. One Method of Constructing a Volatile Filehandle 4.3. One Method of Constructing a Volatile Filehandle
A volatile filehandle, while opaque to the client, could contain: A volatile filehandle, while opaque to the client, could contain:
[volatile bit = 1 | server boot time | slot | generation number] [volatile bit = 1 | server boot time | slot | generation number]
o slot is an index in the server volatile filehandle table
o generation number is the generation number for the table entry/ * slot is an index in the server volatile filehandle table
* generation number is the generation number for the table entry/
slot slot
When the client presents a volatile filehandle, the server makes the When the client presents a volatile filehandle, the server makes the
following checks, which assume that the check for the volatile bit following checks, which assume that the check for the volatile bit
has passed. If the server boot time is less than the current server has passed. If the server boot time is less than the current server
boot time, return NFS4ERR_FHEXPIRED. If slot is out of range, return boot time, return NFS4ERR_FHEXPIRED. If slot is out of range, return
NFS4ERR_BADHANDLE. If the generation number does not match, return NFS4ERR_BADHANDLE. If the generation number does not match, return
NFS4ERR_FHEXPIRED. NFS4ERR_FHEXPIRED.
When the server restarts, the table is gone (it is volatile). When the server restarts, the table is gone (it is volatile).
skipping to change at page 104, line 48 skipping to change at line 4958
accesses a hidden directory of attributes associated with a file accesses a hidden directory of attributes associated with a file
system object. OPENATTR takes a filehandle for the object and system object. OPENATTR takes a filehandle for the object and
returns the filehandle for the attribute hierarchy. The filehandle returns the filehandle for the attribute hierarchy. The filehandle
for the named attributes is a directory object accessible by LOOKUP for the named attributes is a directory object accessible by LOOKUP
or READDIR and contains files whose names represent the named or READDIR and contains files whose names represent the named
attributes and whose data bytes are the value of the attribute. For attributes and whose data bytes are the value of the attribute. For
example: example:
+----------+-----------+---------------------------------+ +----------+-----------+---------------------------------+
| LOOKUP | "foo" | ; look up file | | LOOKUP | "foo" | ; look up file |
+----------+-----------+---------------------------------+
| GETATTR | attrbits | | | GETATTR | attrbits | |
+----------+-----------+---------------------------------+
| OPENATTR | | ; access foo's named attributes | | OPENATTR | | ; access foo's named attributes |
+----------+-----------+---------------------------------+
| LOOKUP | "x11icon" | ; look up specific attribute | | LOOKUP | "x11icon" | ; look up specific attribute |
+----------+-----------+---------------------------------+
| READ | 0,4096 | ; read stream of bytes | | READ | 0,4096 | ; read stream of bytes |
+----------+-----------+---------------------------------+ +----------+-----------+---------------------------------+
Table 3
Named attributes are intended for data needed by applications rather Named attributes are intended for data needed by applications rather
than by an NFS client implementation. NFS implementors are strongly than by an NFS client implementation. NFS implementors are strongly
encouraged to define their new attributes as RECOMMENDED attributes encouraged to define their new attributes as RECOMMENDED attributes
by bringing them to the IETF Standards Track process. by bringing them to the IETF Standards Track process.
The set of attributes that are classified as REQUIRED is deliberately The set of attributes that are classified as REQUIRED is deliberately
small since servers need to do whatever it takes to support them. A small since servers need to do whatever it takes to support them. A
server should support as many of the RECOMMENDED attributes as server should support as many of the RECOMMENDED attributes as
possible but, by their definition, the server is not required to possible but, by their definition, the server is not required to
support all of them. Attributes are deemed REQUIRED if the data is support all of them. Attributes are deemed REQUIRED if the data is
skipping to change at page 107, line 10 skipping to change at line 5072
well. well.
In NFSv4.1, the structure of named attribute directories is In NFSv4.1, the structure of named attribute directories is
restricted in a number of ways, in order to prevent the development restricted in a number of ways, in order to prevent the development
of non-interoperable implementations in which some servers support a of non-interoperable implementations in which some servers support a
fully general hierarchical directory structure for named attributes fully general hierarchical directory structure for named attributes
while others support a limited but adequate structure for named while others support a limited but adequate structure for named
attributes. In such an environment, clients or applications might attributes. In such an environment, clients or applications might
come to depend on non-portable extensions. The restrictions are: come to depend on non-portable extensions. The restrictions are:
o CREATE is not allowed in a named attribute directory. Thus, such * CREATE is not allowed in a named attribute directory. Thus, such
objects as symbolic links and special files are not allowed to be objects as symbolic links and special files are not allowed to be
named attributes. Further, directories may not be created in a named attributes. Further, directories may not be created in a
named attribute directory, so no hierarchical structure of named named attribute directory, so no hierarchical structure of named
attributes for a single object is allowed. attributes for a single object is allowed.
o If OPENATTR is done on a named attribute directory or on a named * If OPENATTR is done on a named attribute directory or on a named
attribute, the server MUST return NFS4ERR_WRONG_TYPE. attribute, the server MUST return NFS4ERR_WRONG_TYPE.
o Doing a RENAME of a named attribute to a different named attribute * Doing a RENAME of a named attribute to a different named attribute
directory or to an ordinary (i.e., non-named-attribute) directory directory or to an ordinary (i.e., non-named-attribute) directory
is not allowed. is not allowed.
o Creating hard links between named attribute directories or between * Creating hard links between named attribute directories or between
named attribute directories and ordinary directories is not named attribute directories and ordinary directories is not
allowed. allowed.
Names of attributes will not be controlled by this document or other Names of attributes will not be controlled by this document or other
IETF Standards Track documents. See Section 22.2 for further IETF Standards Track documents. See Section 22.2 for further
discussion. discussion.
5.4. Classification of Attributes 5.4. Classification of Attributes
Each of the REQUIRED and RECOMMENDED attributes can be classified in Each of the REQUIRED and RECOMMENDED attributes can be classified in
skipping to change at page 107, line 47 skipping to change at line 5109
system (i.e., the value of the attribute will be the same for some or system (i.e., the value of the attribute will be the same for some or
all file objects that share the same fsid attribute (Section 5.8.1.9) all file objects that share the same fsid attribute (Section 5.8.1.9)
and server owner), or per file system object. Note that it is and server owner), or per file system object. Note that it is
possible that some per file system attributes may vary within the possible that some per file system attributes may vary within the
file system, depending on the value of the "homogeneous" file system, depending on the value of the "homogeneous"
(Section 5.8.2.16) attribute. Note that the attributes (Section 5.8.2.16) attribute. Note that the attributes
time_access_set and time_modify_set are not listed in this section time_access_set and time_modify_set are not listed in this section
because they are write-only attributes corresponding to time_access because they are write-only attributes corresponding to time_access
and time_modify, and are used in a special instance of SETATTR. and time_modify, and are used in a special instance of SETATTR.
o The per-server attribute is: * The per-server attribute is:
lease_time lease_time
o The per-file system attributes are: * The per-file system attributes are:
supported_attrs, suppattr_exclcreat, fh_expire_type, supported_attrs, suppattr_exclcreat, fh_expire_type,
link_support, symlink_support, unique_handles, aclsupport, link_support, symlink_support, unique_handles, aclsupport,
cansettime, case_insensitive, case_preserving, cansettime, case_insensitive, case_preserving,
chown_restricted, files_avail, files_free, files_total, chown_restricted, files_avail, files_free, files_total,
fs_locations, homogeneous, maxfilesize, maxname, maxread, fs_locations, homogeneous, maxfilesize, maxname, maxread,
maxwrite, no_trunc, space_avail, space_free, space_total, maxwrite, no_trunc, space_avail, space_free, space_total,
time_delta, change_policy, fs_status, fs_layout_type, time_delta, change_policy, fs_status, fs_layout_type,
fs_locations_info, fs_charset_cap fs_locations_info, fs_charset_cap
o The per-file system object attributes are: * The per-file system object attributes are:
type, change, size, named_attr, fsid, rdattr_error, filehandle, type, change, size, named_attr, fsid, rdattr_error, filehandle,
acl, archive, fileid, hidden, maxlink, mimetype, mode, acl, archive, fileid, hidden, maxlink, mimetype, mode,
numlinks, owner, owner_group, rawdev, space_used, system, numlinks, owner, owner_group, rawdev, space_used, system,
time_access, time_backup, time_create, time_metadata, time_access, time_backup, time_create, time_metadata,
time_modify, mounted_on_fileid, dir_notif_delay, time_modify, mounted_on_fileid, dir_notif_delay,
dirent_notif_delay, dacl, sacl, layout_type, layout_hint, dirent_notif_delay, dacl, sacl, layout_type, layout_hint,
layout_blksize, layout_alignment, mdsthreshold, retention_get, layout_blksize, layout_alignment, mdsthreshold, retention_get,
retention_set, retentevt_get, retentevt_set, retention_hold, retention_set, retentevt_get, retentevt_set, retention_hold,
mode_set_masked mode_set_masked
skipping to change at page 108, line 40 skipping to change at line 5150
Some REQUIRED and RECOMMENDED attributes are set-only; i.e., they can Some REQUIRED and RECOMMENDED attributes are set-only; i.e., they can
be set via SETATTR but not retrieved via GETATTR. Similarly, some be set via SETATTR but not retrieved via GETATTR. Similarly, some
REQUIRED and RECOMMENDED attributes are get-only; i.e., they can be REQUIRED and RECOMMENDED attributes are get-only; i.e., they can be
retrieved via GETATTR but not set via SETATTR. If a client attempts retrieved via GETATTR but not set via SETATTR. If a client attempts
to set a get-only attribute or get a set-only attributes, the server to set a get-only attribute or get a set-only attributes, the server
MUST return NFS4ERR_INVAL. MUST return NFS4ERR_INVAL.
5.6. REQUIRED Attributes - List and Definition References 5.6. REQUIRED Attributes - List and Definition References
The list of REQUIRED attributes appears in Table 2. The meaning of The list of REQUIRED attributes appears in Table 4. The meaning of
the columns of the table are: the columns of the table are:
o Name: The name of the attribute. Name: The name of the attribute.
o Id: The number assigned to the attribute. In the event of Id: The number assigned to the attribute. In the event of conflicts
conflicts between the assigned number and [10], the latter is between the assigned number and [10], the latter is likely
likely authoritative, but should be resolved with Errata to this authoritative, but should be resolved with Errata to this document
document and/or [10]. See [50] for the Errata process. and/or [10]. See [51] for the Errata process.
o Data Type: The XDR data type of the attribute. Data Type: The XDR data type of the attribute.
o Acc: Access allowed to the attribute. R means read-only (GETATTR Acc: Access allowed to the attribute. R means read-only (GETATTR
may retrieve, SETATTR may not set). W means write-only (SETATTR may retrieve, SETATTR may not set). W means write-only (SETATTR
may set, GETATTR may not retrieve). R W means read/write (GETATTR may set, GETATTR may not retrieve). R W means read/write (GETATTR
may retrieve, SETATTR may set). may retrieve, SETATTR may set).
o Defined in: The section of this specification that describes the Defined in: The section of this specification that describes the
attribute. attribute.
+--------------------+----+------------+-----+-------------------+ +====================+====+============+=====+==================+
| Name | Id | Data Type | Acc | Defined in: | | Name | Id | Data Type | Acc | Defined in: |
+--------------------+----+------------+-----+-------------------+ +====================+====+============+=====+==================+
| supported_attrs | 0 | bitmap4 | R | Section 5.8.1.1 | | supported_attrs | 0 | bitmap4 | R | Section 5.8.1.1 |
| type | 1 | nfs_ftype4 | R | Section 5.8.1.2 | +--------------------+----+------------+-----+------------------+
| fh_expire_type | 2 | uint32_t | R | Section 5.8.1.3 | | type | 1 | nfs_ftype4 | R | Section 5.8.1.2 |
| change | 3 | uint64_t | R | Section 5.8.1.4 | +--------------------+----+------------+-----+------------------+
| size | 4 | uint64_t | R W | Section 5.8.1.5 | | fh_expire_type | 2 | uint32_t | R | Section 5.8.1.3 |
| link_support | 5 | bool | R | Section 5.8.1.6 | +--------------------+----+------------+-----+------------------+
| symlink_support | 6 | bool | R | Section 5.8.1.7 | | change | 3 | uint64_t | R | Section 5.8.1.4 |
| named_attr | 7 | bool | R | Section 5.8.1.8 | +--------------------+----+------------+-----+------------------+
| fsid | 8 | fsid4 | R | Section 5.8.1.9 | | size | 4 | uint64_t | R W | Section 5.8.1.5 |
| unique_handles | 9 | bool | R | Section 5.8.1.10 | +--------------------+----+------------+-----+------------------+
| lease_time | 10 | nfs_lease4 | R | Section 5.8.1.11 | | link_support | 5 | bool | R | Section 5.8.1.6 |
| rdattr_error | 11 | enum | R | Section 5.8.1.12 | +--------------------+----+------------+-----+------------------+
| filehandle | 19 | nfs_fh4 | R | Section 5.8.1.13 | | symlink_support | 6 | bool | R | Section 5.8.1.7 |
| suppattr_exclcreat | 75 | bitmap4 | R | Section 5.8.1.14 | +--------------------+----+------------+-----+------------------+
+--------------------+----+------------+-----+-------------------+ | named_attr | 7 | bool | R | Section 5.8.1.8 |
+--------------------+----+------------+-----+------------------+
| fsid | 8 | fsid4 | R | Section 5.8.1.9 |
+--------------------+----+------------+-----+------------------+
| unique_handles | 9 | bool | R | Section 5.8.1.10 |
+--------------------+----+------------+-----+------------------+
| lease_time | 10 | nfs_lease4 | R | Section 5.8.1.11 |
+--------------------+----+------------+-----+------------------+
| rdattr_error | 11 | enum | R | Section 5.8.1.12 |
+--------------------+----+------------+-----+------------------+
| filehandle | 19 | nfs_fh4 | R | Section 5.8.1.13 |
+--------------------+----+------------+-----+------------------+
| suppattr_exclcreat | 75 | bitmap4 | R | Section 5.8.1.14 |
+--------------------+----+------------+-----+------------------+
Table 2 Table 4
5.7. RECOMMENDED Attributes - List and Definition References 5.7. RECOMMENDED Attributes - List and Definition References
The RECOMMENDED attributes are defined in Table 3. The meanings of The RECOMMENDED attributes are defined in Table 5. The meanings of
the column headers are the same as Table 2; see Section 5.6 for the the column headers are the same as Table 4; see Section 5.6 for the
meanings. meanings.
+--------------------+----+----------------+-----+------------------+ +====================+====+====================+=====+=============+
| Name | Id | Data Type | Acc | Defined in: | | Name | Id | Data Type | Acc | Defined in: |
+--------------------+----+----------------+-----+------------------+ +====================+====+====================+=====+=============+
| acl | 12 | nfsace4<> | R W | Section 6.2.1 | | acl | 12 | nfsace4<> | R W | Section |
| aclsupport | 13 | uint32_t | R | Section 6.2.1.2 | | | | | | 6.2.1 |
| archive | 14 | bool | R W | Section 5.8.2.1 | +--------------------+----+--------------------+-----+-------------+
| cansettime | 15 | bool | R | Section 5.8.2.2 | | aclsupport | 13 | uint32_t | R | Section |
| case_insensitive | 16 | bool | R | Section 5.8.2.3 | | | | | | 6.2.1.2 |
| case_preserving | 17 | bool | R | Section 5.8.2.4 | +--------------------+----+--------------------+-----+-------------+
| change_policy | 60 | chg_policy4 | R | Section 5.8.2.5 | | archive | 14 | bool | R W | Section |
| chown_restricted | 18 | bool | R | Section 5.8.2.6 | | | | | | 5.8.2.1 |
| dacl | 58 | nfsacl41 | R W | Section 6.2.2 | +--------------------+----+--------------------+-----+-------------+
| dir_notif_delay | 56 | nfstime4 | R | Section 5.11.1 | | cansettime | 15 | bool | R | Section |
| dirent_notif_delay | 57 | nfstime4 | R | Section 5.11.2 | | | | | | 5.8.2.2 |
| fileid | 20 | uint64_t | R | Section 5.8.2.7 | +--------------------+----+--------------------+-----+-------------+
| files_avail | 21 | uint64_t | R | Section 5.8.2.8 | | case_insensitive | 16 | bool | R | Section |
| files_free | 22 | uint64_t | R | Section 5.8.2.9 | | | | | | 5.8.2.3 |
| files_total | 23 | uint64_t | R | Section 5.8.2.10 | +--------------------+----+--------------------+-----+-------------+
| fs_charset_cap | 76 | uint32_t | R | Section 5.8.2.11 | | case_preserving | 17 | bool | R | Section |
| fs_layout_type | 62 | layouttype4<> | R | Section 5.12.1 | | | | | | 5.8.2.4 |
| fs_locations | 24 | fs_locations | R | Section 5.8.2.12 | +--------------------+----+--------------------+-----+-------------+
| fs_locations_info | 67 | * | R | Section 5.8.2.13 | | change_policy | 60 | chg_policy4 | R | Section |
| fs_status | 61 | fs4_status | R | Section 5.8.2.14 | | | | | | 5.8.2.5 |
| hidden | 25 | bool | R W | Section 5.8.2.15 | +--------------------+----+--------------------+-----+-------------+
| homogeneous | 26 | bool | R | Section 5.8.2.16 | | chown_restricted | 18 | bool | R | Section |
| layout_alignment | 66 | uint32_t | R | Section 5.12.2 | | | | | | 5.8.2.6 |
| layout_blksize | 65 | uint32_t | R | Section 5.12.3 | +--------------------+----+--------------------+-----+-------------+
| layout_hint | 63 | layouthint4 | W | Section 5.12.4 | | dacl | 58 | nfsacl41 | R W | Section |
| layout_type | 64 | layouttype4<> | R | Section 5.12.5 | | | | | | 6.2.2 |
| maxfilesize | 27 | uint64_t | R | Section 5.8.2.17 | +--------------------+----+--------------------+-----+-------------+
| maxlink | 28 | uint32_t | R | Section 5.8.2.18 | | dir_notif_delay | 56 | nfstime4 | R | Section |
| maxname | 29 | uint32_t | R | Section 5.8.2.19 | | | | | | 5.11.1 |
| maxread | 30 | uint64_t | R | Section 5.8.2.20 | +--------------------+----+--------------------+-----+-------------+
| maxwrite | 31 | uint64_t | R | Section 5.8.2.21 | | dirent_notif_delay | 57 | nfstime4 | R | Section |
| mdsthreshold | 68 | mdsthreshold4 | R | Section 5.12.6 | | | | | | 5.11.2 |
| mimetype | 32 | utf8str_cs | R W | Section 5.8.2.22 | +--------------------+----+--------------------+-----+-------------+
| mode | 33 | mode4 | R W | Section 6.2.4 | | fileid | 20 | uint64_t | R | Section |
| mode_set_masked | 74 | mode_masked4 | W | Section 6.2.5 | | | | | | 5.8.2.7 |
| mounted_on_fileid | 55 | uint64_t | R | Section 5.8.2.23 | +--------------------+----+--------------------+-----+-------------+
| no_trunc | 34 | bool | R | Section 5.8.2.24 | | files_avail | 21 | uint64_t | R | Section |
| numlinks | 35 | uint32_t | R | Section 5.8.2.25 | | | | | | 5.8.2.8 |
| owner | 36 | utf8str_mixed | R W | Section 5.8.2.26 | +--------------------+----+--------------------+-----+-------------+
| owner_group | 37 | utf8str_mixed | R W | Section 5.8.2.27 | | files_free | 22 | uint64_t | R | Section |
| quota_avail_hard | 38 | uint64_t | R | Section 5.8.2.28 | | | | | | 5.8.2.9 |
| quota_avail_soft | 39 | uint64_t | R | Section 5.8.2.29 | +--------------------+----+--------------------+-----+-------------+
| quota_used | 40 | uint64_t | R | Section 5.8.2.30 | | files_total | 23 | uint64_t | R | Section |
| rawdev | 41 | specdata4 | R | Section 5.8.2.31 | | | | | | 5.8.2.10 |
| retentevt_get | 71 | retention_get4 | R | Section 5.13.3 | +--------------------+----+--------------------+-----+-------------+
| retentevt_set | 72 | retention_set4 | W | Section 5.13.4 | | fs_charset_cap | 76 | uint32_t | R | Section |
| retention_get | 69 | retention_get4 | R | Section 5.13.1 | | | | | | 5.8.2.11 |
| retention_hold | 73 | uint64_t | R W | Section 5.13.5 | +--------------------+----+--------------------+-----+-------------+
| retention_set | 70 | retention_set4 | W | Section 5.13.2 | | fs_layout_type | 62 | layouttype4<> | R | Section |
| sacl | 59 | nfsacl41 | R W | Section 6.2.3 | | | | | | 5.12.1 |
| space_avail | 42 | uint64_t | R | Section 5.8.2.32 | +--------------------+----+--------------------+-----+-------------+
| space_free | 43 | uint64_t | R | Section 5.8.2.33 | | fs_locations | 24 | fs_locations | R | Section |
| space_total | 44 | uint64_t | R | Section 5.8.2.34 | | | | | | 5.8.2.12 |
| space_used | 45 | uint64_t | R | Section 5.8.2.35 | +--------------------+----+--------------------+-----+-------------+
| system | 46 | bool | R W | Section 5.8.2.36 | | fs_locations_info | 67 | fs_locations_info4 | R | Section |
| time_access | 47 | nfstime4 | R | Section 5.8.2.37 | | | | | | 5.8.2.13 |
| time_access_set | 48 | settime4 | W | Section 5.8.2.38 | +--------------------+----+--------------------+-----+-------------+
| time_backup | 49 | nfstime4 | R W | Section 5.8.2.39 | | fs_status | 61 | fs4_status | R | Section |
| time_create | 50 | nfstime4 | R W | Section 5.8.2.40 | | | | | | 5.8.2.14 |
| time_delta | 51 | nfstime4 | R | Section 5.8.2.41 | +--------------------+----+--------------------+-----+-------------+
| time_metadata | 52 | nfstime4 | R | Section 5.8.2.42 | | hidden | 25 | bool | R W | Section |
| time_modify | 53 | nfstime4 | R | Section 5.8.2.43 | | | | | | 5.8.2.15 |
| time_modify_set | 54 | settime4 | W | Section 5.8.2.44 | +--------------------+----+--------------------+-----+-------------+
+--------------------+----+----------------+-----+------------------+ | homogeneous | 26 | bool | R | Section |
| | | | | 5.8.2.16 |
Table 3 +--------------------+----+--------------------+-----+-------------+
| layout_alignment | 66 | uint32_t | R | Section |
| | | | | 5.12.2 |
+--------------------+----+--------------------+-----+-------------+
| layout_blksize | 65 | uint32_t | R | Section |
| | | | | 5.12.3 |
+--------------------+----+--------------------+-----+-------------+
| layout_hint | 63 | layouthint4 | W | Section |
| | | | | 5.12.4 |
+--------------------+----+--------------------+-----+-------------+
| layout_type | 64 | layouttype4<> | R | Section |
| | | | | 5.12.5 |
+--------------------+----+--------------------+-----+-------------+
| maxfilesize | 27 | uint64_t | R | Section |
| | | | | 5.8.2.17 |
+--------------------+----+--------------------+-----+-------------+
| maxlink | 28 | uint32_t | R | Section |
| | | | | 5.8.2.18 |
+--------------------+----+--------------------+-----+-------------+
| maxname | 29 | uint32_t | R | Section |
| | | | | 5.8.2.19 |
+--------------------+----+--------------------+-----+-------------+
| maxread | 30 | uint64_t | R | Section |
| | | | | 5.8.2.20 |
+--------------------+----+--------------------+-----+-------------+
| maxwrite | 31 | uint64_t | R | Section |
| | | | | 5.8.2.21 |
+--------------------+----+--------------------+-----+-------------+
| mdsthreshold | 68 | mdsthreshold4 | R | Section |
| | | | | 5.12.6 |
+--------------------+----+--------------------+-----+-------------+
| mimetype | 32 | utf8str_cs | R W | Section |
| | | | | 5.8.2.22 |
+--------------------+----+--------------------+-----+-------------+
| mode | 33 | mode4 | R W | Section |
| | | | | 6.2.4 |
+--------------------+----+--------------------+-----+-------------+
| mode_set_masked | 74 | mode_masked4 | W | Section |
| | | | | 6.2.5 |
+--------------------+----+--------------------+-----+-------------+
| mounted_on_fileid | 55 | uint64_t | R | Section |
| | | | | 5.8.2.23 |
+--------------------+----+--------------------+-----+-------------+
| no_trunc | 34 | bool | R | Section |
| | | | | 5.8.2.24 |
+--------------------+----+--------------------+-----+-------------+
| numlinks | 35 | uint32_t | R | Section |
| | | | | 5.8.2.25 |
+--------------------+----+--------------------+-----+-------------+
| owner | 36 | utf8str_mixed | R W | Section |
| | | | | 5.8.2.26 |
+--------------------+----+--------------------+-----+-------------+
| owner_group | 37 | utf8str_mixed | R W | Section |
| | | | | 5.8.2.27 |
+--------------------+----+--------------------+-----+-------------+
| quota_avail_hard | 38 | uint64_t | R | Section |
| | | | | 5.8.2.28 |
+--------------------+----+--------------------+-----+-------------+
| quota_avail_soft | 39 | uint64_t | R | Section |
| | | | | 5.8.2.29 |
+--------------------+----+--------------------+-----+-------------+
| quota_used | 40 | uint64_t | R | Section |
| | | | | 5.8.2.30 |
+--------------------+----+--------------------+-----+-------------+
| rawdev | 41 | specdata4 | R | Section |
| | | | | 5.8.2.31 |
+--------------------+----+--------------------+-----+-------------+
| retentevt_get | 71 | retention_get4 | R | Section |
| | | | | 5.13.3 |
+--------------------+----+--------------------+-----+-------------+
| retentevt_set | 72 | retention_set4 | W | Section |
| | | | | 5.13.4 |
+--------------------+----+--------------------+-----+-------------+
| retention_get | 69 | retention_get4 | R | Section |
| | | | | 5.13.1 |
+--------------------+----+--------------------+-----+-------------+
| retention_hold | 73 | uint64_t | R W | Section |
| | | | | 5.13.5 |
+--------------------+----+--------------------+-----+-------------+
| retention_set | 70 | retention_set4 | W | Section |
| | | | | 5.13.2 |
+--------------------+----+--------------------+-----+-------------+
| sacl | 59 | nfsacl41 | R W | Section |
| | | | | 6.2.3 |
+--------------------+----+--------------------+-----+-------------+
| space_avail | 42 | uint64_t | R | Section |
| | | | | 5.8.2.32 |
+--------------------+----+--------------------+-----+-------------+
| space_free | 43 | uint64_t | R | Section |
| | | | | 5.8.2.33 |
+--------------------+----+--------------------+-----+-------------+
| space_total | 44 | uint64_t | R | Section |
| | | | | 5.8.2.34 |
+--------------------+----+--------------------+-----+-------------+
| space_used | 45 | uint64_t | R | Section |
| | | | | 5.8.2.35 |
+--------------------+----+--------------------+-----+-------------+
| system | 46 | bool | R W | Section |
| | | | | 5.8.2.36 |
+--------------------+----+--------------------+-----+-------------+
| time_access | 47 | nfstime4 | R | Section |
| | | | | 5.8.2.37 |
+--------------------+----+--------------------+-----+-------------+
| time_access_set | 48 | settime4 | W | Section |
| | | | | 5.8.2.38 |
+--------------------+----+--------------------+-----+-------------+
| time_backup | 49 | nfstime4 | R W | Section |
| | | | | 5.8.2.39 |
+--------------------+----+--------------------+-----+-------------+
| time_create | 50 | nfstime4 | R W | Section |
| | | | | 5.8.2.40 |
+--------------------+----+--------------------+-----+-------------+
| time_delta | 51 | nfstime4 | R | Section |
| | | | | 5.8.2.41 |
+--------------------+----+--------------------+-----+-------------+
| time_metadata | 52 | nfstime4 | R | Section |
| | | | | 5.8.2.42 |
+--------------------+----+--------------------+-----+-------------+
| time_modify | 53 | nfstime4 | R | Section |
| | | | | 5.8.2.43 |
+--------------------+----+--------------------+-----+-------------+
| time_modify_set | 54 | settime4 | W | Section |
| | | | | 5.8.2.44 |
+--------------------+----+--------------------+-----+-------------+
* fs_locations_info4 Table 5
5.8. Attribute Definitions 5.8. Attribute Definitions
5.8.1. Definitions of REQUIRED Attributes 5.8.1. Definitions of REQUIRED Attributes
5.8.1.1. Attribute 0: supported_attrs 5.8.1.1. Attribute 0: supported_attrs
The bit vector that would retrieve all REQUIRED and RECOMMENDED The bit vector that would retrieve all REQUIRED and RECOMMENDED
attributes that are supported for this object. The scope of this attributes that are supported for this object. The scope of this
attribute applies to all objects with a matching fsid. attribute applies to all objects with a matching fsid.
5.8.1.2. Attribute 1: type 5.8.1.2. Attribute 1: type
Designates the type of an object in terms of one of a number of Designates the type of an object in terms of one of a number of
special constants: special constants:
o NF4REG designates a regular file. * NF4REG designates a regular file.
o NF4DIR designates a directory. * NF4DIR designates a directory.
o NF4BLK designates a block device special file. * NF4BLK designates a block device special file.
o NF4CHR designates a character device special file. * NF4CHR designates a character device special file.
o NF4LNK designates a symbolic link. * NF4LNK designates a symbolic link.
o NF4SOCK designates a named socket special file. * NF4SOCK designates a named socket special file.
o NF4FIFO designates a fifo special file. * NF4FIFO designates a fifo special file.
o NF4ATTRDIR designates a named attribute directory. * NF4ATTRDIR designates a named attribute directory.
o NF4NAMEDATTR designates a named attribute. * NF4NAMEDATTR designates a named attribute.
Within the explanatory text and operation descriptions, the following Within the explanatory text and operation descriptions, the following
phrases will be used with the meanings given below: phrases will be used with the meanings given below:
o The phrase "is a directory" means that the object's type attribute * The phrase "is a directory" means that the object's type attribute
is NF4DIR or NF4ATTRDIR. is NF4DIR or NF4ATTRDIR.
o The phrase "is a special file" means that the object's type * The phrase "is a special file" means that the object's type
attribute is NF4BLK, NF4CHR, NF4SOCK, or NF4FIFO. attribute is NF4BLK, NF4CHR, NF4SOCK, or NF4FIFO.
o The phrases "is an ordinary file" and "is a regular file" mean * The phrases "is an ordinary file" and "is a regular file" mean
that the object's type attribute is NF4REG or NF4NAMEDATTR. that the object's type attribute is NF4REG or NF4NAMEDATTR.
5.8.1.3. Attribute 2: fh_expire_type 5.8.1.3. Attribute 2: fh_expire_type
Server uses this to specify filehandle expiration behavior to the Server uses this to specify filehandle expiration behavior to the
client. See Section 4 for additional description. client. See Section 4 for additional description.
5.8.1.4. Attribute 3: change 5.8.1.4. Attribute 3: change
A value created by the server that the client can use to determine if A value created by the server that the client can use to determine if
skipping to change at page 120, line 27 skipping to change at line 5832
5.8.2.44. Attribute 54: time_modify_set 5.8.2.44. Attribute 54: time_modify_set
Sets the time of last modification to the object. SETATTR use only. Sets the time of last modification to the object. SETATTR use only.
5.9. Interpreting owner and owner_group 5.9. Interpreting owner and owner_group
The RECOMMENDED attributes "owner" and "owner_group" (and also users The RECOMMENDED attributes "owner" and "owner_group" (and also users
and groups within the "acl" attribute) are represented in terms of a and groups within the "acl" attribute) are represented in terms of a
UTF-8 string. To avoid a representation that is tied to a particular UTF-8 string. To avoid a representation that is tied to a particular
underlying implementation at the client or server, the use of the underlying implementation at the client or server, the use of the
UTF-8 string has been chosen. Note that Section 6.1 of RFC 2624 [52] UTF-8 string has been chosen. Note that Section 6.1 of RFC 2624 [53]
provides additional rationale. It is expected that the client and provides additional rationale. It is expected that the client and
server will have their own local representation of owner and server will have their own local representation of owner and
owner_group that is used for local storage or presentation to the end owner_group that is used for local storage or presentation to the end
user. Therefore, it is expected that when these attributes are user. Therefore, it is expected that when these attributes are
transferred between the client and server, the local representation transferred between the client and server, the local representation
is translated to a syntax of the form "user@dns_domain". This will is translated to a syntax of the form "user@dns_domain". This will
allow for a client and server that do not use the same local allow for a client and server that do not use the same local
representation the ability to translate to a common syntax that can representation the ability to translate to a common syntax that can
be interpreted by both. be interpreted by both.
skipping to change at page 126, line 17 skipping to change at line 6107
If retention is enabled, with no duration specified in either this If retention is enabled, with no duration specified in either this
SETATTR or a previous SETATTR, the duration defaults to zero seconds. SETATTR or a previous SETATTR, the duration defaults to zero seconds.
The server MAY restrict the enabling of retention or the duration of The server MAY restrict the enabling of retention or the duration of
retention on the basis of the ACE4_WRITE_RETENTION ACL permission. retention on the basis of the ACE4_WRITE_RETENTION ACL permission.
The enabling of retention MUST NOT prevent the enabling of event- The enabling of retention MUST NOT prevent the enabling of event-
based retention or the modification of the retention_hold attribute. based retention or the modification of the retention_hold attribute.
The following rules apply to both the retention_set and retentevt_set The following rules apply to both the retention_set and retentevt_set
attributes. attributes.
o As long as retention is not enabled, the client is permitted to * As long as retention is not enabled, the client is permitted to
decrease the duration. decrease the duration.
o The duration can always be set to an equal or higher value, even * The duration can always be set to an equal or higher value, even
if retention is enabled. Note that once retention is enabled, the if retention is enabled. Note that once retention is enabled, the
actual duration (as returned by the retention_get or retentevt_get actual duration (as returned by the retention_get or retentevt_get
attributes; see Section 5.13.1 or Section 5.13.3) is constantly attributes; see Section 5.13.1 or Section 5.13.3) is constantly
counting down to zero (one unit per second), unless the duration counting down to zero (one unit per second), unless the duration
was set to RET4_DURATION_INFINITE. Thus, it will not be possible was set to RET4_DURATION_INFINITE. Thus, it will not be possible
for the client to precisely extend the duration on a file that has for the client to precisely extend the duration on a file that has
retention enabled. retention enabled.
o While retention is enabled, attempts to disable retention or * While retention is enabled, attempts to disable retention or
decrease the retention's duration MUST fail with the error decrease the retention's duration MUST fail with the error
NFS4ERR_INVAL. NFS4ERR_INVAL.
o If the principal attempting to change retention_set or * If the principal attempting to change retention_set or
retentevt_set does not have ACE4_WRITE_RETENTION permissions, the retentevt_set does not have ACE4_WRITE_RETENTION permissions, the
attempt MUST fail with NFS4ERR_ACCESS. attempt MUST fail with NFS4ERR_ACCESS.
5.13.3. Attribute 71: retentevt_get 5.13.3. Attribute 71: retentevt_get
Gets the event-based retention duration, and if enabled, the event- Gets the event-based retention duration, and if enabled, the event-
based retention begin time of the file object. This attribute is based retention begin time of the file object. This attribute is
like retention_get, but refers to event-based retention. The event like retention_get, but refers to event-based retention. The event
that triggers event-based retention is not defined by the NFSv4.1 that triggers event-based retention is not defined by the NFSv4.1
specification. specification.
skipping to change at page 127, line 46 skipping to change at line 6184
"sacl", "aclsupport", "mode", and "mode_set_masked" file attributes "sacl", "aclsupport", "mode", and "mode_set_masked" file attributes
and their interactions. Note that file attributes may apply to any and their interactions. Note that file attributes may apply to any
file system object. file system object.
6.1. Goals 6.1. Goals
ACLs and modes represent two well-established models for specifying ACLs and modes represent two well-established models for specifying
permissions. This section specifies requirements that attempt to permissions. This section specifies requirements that attempt to
meet the following goals: meet the following goals:
o If a server supports the mode attribute, it should provide * If a server supports the mode attribute, it should provide
reasonable semantics to clients that only set and retrieve the reasonable semantics to clients that only set and retrieve the
mode attribute. mode attribute.
o If a server supports ACL attributes, it should provide reasonable * If a server supports ACL attributes, it should provide reasonable
semantics to clients that only set and retrieve those attributes. semantics to clients that only set and retrieve those attributes.
o On servers that support the mode attribute, if ACL attributes have * On servers that support the mode attribute, if ACL attributes have
never been set on an object, via inheritance or explicitly, the never been set on an object, via inheritance or explicitly, the
behavior should be traditional UNIX-like behavior. behavior should be traditional UNIX-like behavior.
o On servers that support the mode attribute, if the ACL attributes * On servers that support the mode attribute, if the ACL attributes
have been previously set on an object, either explicitly or via have been previously set on an object, either explicitly or via
inheritance: inheritance:
* Setting only the mode attribute should effectively control the - Setting only the mode attribute should effectively control the
traditional UNIX-like permissions of read, write, and execute traditional UNIX-like permissions of read, write, and execute
on owner, owner_group, and other. on owner, owner_group, and other.
* Setting only the mode attribute should provide reasonable - Setting only the mode attribute should provide reasonable
security. For example, setting a mode of 000 should be enough security. For example, setting a mode of 000 should be enough
to ensure that future OPEN operations for to ensure that future OPEN operations for
OPEN4_SHARE_ACCESS_READ or OPEN4_SHARE_ACCESS_WRITE by any OPEN4_SHARE_ACCESS_READ or OPEN4_SHARE_ACCESS_WRITE by any
principal fail, regardless of a previously existing or principal fail, regardless of a previously existing or
inherited ACL. inherited ACL.
o NFSv4.1 may introduce different semantics relating to the mode and * NFSv4.1 may introduce different semantics relating to the mode and
ACL attributes, but it does not render invalid any previously ACL attributes, but it does not render invalid any previously
existing implementations. Additionally, this section provides existing implementations. Additionally, this section provides
clarifications based on previous implementations and discussions clarifications based on previous implementations and discussions
around them. around them.
o On servers that support both the mode and the acl or dacl * On servers that support both the mode and the acl or dacl
attributes, the server must keep the two consistent with each attributes, the server must keep the two consistent with each
other. The value of the mode attribute (with the exception of the other. The value of the mode attribute (with the exception of the
three high-order bits described in Section 6.2.4) must be three high-order bits described in Section 6.2.4) must be
determined entirely by the value of the ACL, so that use of the determined entirely by the value of the ACL, so that use of the
mode is never required for anything other than setting the three mode is never required for anything other than setting the three
high-order bits. See Section 6.4.1 for exact requirements. high-order bits. See Section 6.4.1 for exact requirements.
o When a mode attribute is set on an object, the ACL attributes may * When a mode attribute is set on an object, the ACL attributes may
need to be modified in order to not conflict with the new mode. need to be modified in order to not conflict with the new mode.
In such cases, it is desirable that the ACL keep as much In such cases, it is desirable that the ACL keep as much
information as possible. This includes information about information as possible. This includes information about
inheritance, AUDIT and ALARM ACEs, and permissions granted and inheritance, AUDIT and ALARM ACEs, and permissions granted and
denied that do not conflict with the new mode. denied that do not conflict with the new mode.
6.2. File Attributes Discussion 6.2. File Attributes Discussion
6.2.1. Attribute 12: acl 6.2.1. Attribute 12: acl
skipping to change at page 131, line 5 skipping to change at line 6320
const ACE4_ACCESS_ALLOWED_ACE_TYPE = 0x00000000; const ACE4_ACCESS_ALLOWED_ACE_TYPE = 0x00000000;
const ACE4_ACCESS_DENIED_ACE_TYPE = 0x00000001; const ACE4_ACCESS_DENIED_ACE_TYPE = 0x00000001;
const ACE4_SYSTEM_AUDIT_ACE_TYPE = 0x00000002; const ACE4_SYSTEM_AUDIT_ACE_TYPE = 0x00000002;
const ACE4_SYSTEM_ALARM_ACE_TYPE = 0x00000003; const ACE4_SYSTEM_ALARM_ACE_TYPE = 0x00000003;
Only the ALLOWED and DENIED bits may be used in the dacl attribute, Only the ALLOWED and DENIED bits may be used in the dacl attribute,
and only the AUDIT and ALARM bits may be used in the sacl attribute. and only the AUDIT and ALARM bits may be used in the sacl attribute.
All four are permitted in the acl attribute. All four are permitted in the acl attribute.
+------------------------------+--------------+---------------------+ +==============================+==============+=====================+
| Value | Abbreviation | Description | | Value | Abbreviation | Description |
+------------------------------+--------------+---------------------+ +==============================+==============+=====================+
| ACE4_ACCESS_ALLOWED_ACE_TYPE | ALLOW | Explicitly grants | | ACE4_ACCESS_ALLOWED_ACE_TYPE | ALLOW | Explicitly grants |
| | | the access defined | | | | the access |
| | | in acemask4 to the | | | | defined in |
| | | file or directory. | | | | acemask4 to the |
| | | file or |
| | | directory. |
+------------------------------+--------------+---------------------+
| ACE4_ACCESS_DENIED_ACE_TYPE | DENY | Explicitly denies | | ACE4_ACCESS_DENIED_ACE_TYPE | DENY | Explicitly denies |
| | | the access defined | | | | the access |
| | | in acemask4 to the | | | | defined in |
| | | file or directory. | | | | acemask4 to the |
| | | file or |
| | | directory. |
+------------------------------+--------------+---------------------+
| ACE4_SYSTEM_AUDIT_ACE_TYPE | AUDIT | Log (in a system- | | ACE4_SYSTEM_AUDIT_ACE_TYPE | AUDIT | Log (in a system- |
| | | dependent way) any | | | | dependent way) |
| | | access attempt to a | | | | any access |
| | | file or directory | | | | attempt to a file |
| | | that uses any of | | | | or directory that |
| | | the access methods | | | | uses any of the |
| | | access methods |
| | | specified in | | | | specified in |
| | | acemask4. | | | | acemask4. |
+------------------------------+--------------+---------------------+
| ACE4_SYSTEM_ALARM_ACE_TYPE | ALARM | Generate an alarm | | ACE4_SYSTEM_ALARM_ACE_TYPE | ALARM | Generate an alarm |
| | | (in a system- | | | | (in a system- |
| | | dependent way) when | | | | dependent way) |
| | | any access attempt | | | | when any access |
| | | is made to a file | | | | attempt is made |
| | | or directory for | | | | to a file or |
| | | the access methods | | | | directory for the |
| | | access methods |
| | | specified in | | | | specified in |
| | | acemask4. | | | | acemask4. |
+------------------------------+--------------+---------------------+ +------------------------------+--------------+---------------------+
Table 6
The "Abbreviation" column denotes how the types will be referred to The "Abbreviation" column denotes how the types will be referred to
throughout the rest of this section. throughout the rest of this section.
6.2.1.2. Attribute 13: aclsupport 6.2.1.2. Attribute 13: aclsupport
A server need not support all of the above ACE types. This attribute A server need not support all of the above ACE types. This attribute
indicates which ACE types are supported for the current file system. indicates which ACE types are supported for the current file system.
The bitmask constants used to represent the above definitions within The bitmask constants used to represent the above definitions within
the aclsupport attribute are as follows: the aclsupport attribute are as follows:
skipping to change at page 134, line 39 skipping to change at line 6500
The ability to modify a file's data, but only starting at EOF. The ability to modify a file's data, but only starting at EOF.
This allows for the notion of append-only files, by allowing This allows for the notion of append-only files, by allowing
ACE4_APPEND_DATA and denying ACE4_WRITE_DATA to the same user ACE4_APPEND_DATA and denying ACE4_WRITE_DATA to the same user
or group. If a file has an ACL such as the one described above or group. If a file has an ACL such as the one described above
and a WRITE request is made for somewhere other than EOF, the and a WRITE request is made for somewhere other than EOF, the
server SHOULD return NFS4ERR_ACCESS. server SHOULD return NFS4ERR_ACCESS.
ACE4_ADD_SUBDIRECTORY ACE4_ADD_SUBDIRECTORY
Operation(s) affected: Operation(s) affected:
CREATE CREATE
RENAME RENAME
Discussion: Discussion:
Permission to create a subdirectory in a directory. The CREATE Permission to create a subdirectory in a directory. The CREATE
operation is affected when nfs_ftype4 is NF4DIR. The RENAME operation is affected when nfs_ftype4 is NF4DIR. The RENAME
operation is always affected. operation is always affected.
ACE4_READ_NAMED_ATTRS ACE4_READ_NAMED_ATTRS
Operation(s) affected:
Operation(s) affected:
OPENATTR OPENATTR
Discussion: Discussion:
Permission to read the named attributes of a file or to look up Permission to read the named attributes of a file or to look up
the named attribute directory. OPENATTR is affected when it is the named attribute directory. OPENATTR is affected when it is
not used to create a named attribute directory. This is when not used to create a named attribute directory. This is when
1) createdir is TRUE, but a named attribute directory already 1) createdir is TRUE, but a named attribute directory already
exists, or 2) createdir is FALSE. exists, or 2) createdir is FALSE.
ACE4_WRITE_NAMED_ATTRS ACE4_WRITE_NAMED_ATTRS
Operation(s) affected: Operation(s) affected:
OPENATTR OPENATTR
Discussion: Discussion:
Permission to write the named attributes of a file or to create Permission to write the named attributes of a file or to create
a named attribute directory. OPENATTR is affected when it is a named attribute directory. OPENATTR is affected when it is
used to create a named attribute directory. This is when used to create a named attribute directory. This is when
createdir is TRUE and no named attribute directory exists. The createdir is TRUE and no named attribute directory exists. The
ability to check whether or not a named attribute directory ability to check whether or not a named attribute directory
exists depends on the ability to look it up; therefore, users exists depends on the ability to look it up; therefore, users
also need the ACE4_READ_NAMED_ATTRS permission in order to also need the ACE4_READ_NAMED_ATTRS permission in order to
create a named attribute directory. create a named attribute directory.
ACE4_EXECUTE ACE4_EXECUTE
skipping to change at page 136, line 21 skipping to change at line 6568
and ACE4_READ_DATA bits identically when deciding to permit a and ACE4_READ_DATA bits identically when deciding to permit a
READ operation, it SHOULD still allow the two bits to be set READ operation, it SHOULD still allow the two bits to be set
independently in ACLs, and MUST distinguish between them when independently in ACLs, and MUST distinguish between them when
replying to ACCESS operations. In particular, servers SHOULD replying to ACCESS operations. In particular, servers SHOULD
NOT silently turn on one of the two bits when the other is set, NOT silently turn on one of the two bits when the other is set,
as that would make it impossible for the client to correctly as that would make it impossible for the client to correctly
enforce the distinction between read and execute permissions. enforce the distinction between read and execute permissions.
As an example, following a SETATTR of the following ACL: As an example, following a SETATTR of the following ACL:
nfsuser:ACE4_EXECUTE:ALLOW nfsuser:ACE4_EXECUTE:ALLOW
A subsequent GETATTR of ACL for that file SHOULD return: A subsequent GETATTR of ACL for that file SHOULD return:
nfsuser:ACE4_EXECUTE:ALLOW nfsuser:ACE4_EXECUTE:ALLOW
Rather than: Rather than:
nfsuser:ACE4_EXECUTE/ACE4_READ_DATA:ALLOW nfsuser:ACE4_EXECUTE/ACE4_READ_DATA:ALLOW
ACE4_EXECUTE ACE4_EXECUTE
Operation(s) affected: Operation(s) affected:
LOOKUP LOOKUP
Discussion: Discussion:
Permission to traverse/search a directory. Permission to traverse/search a directory.
ACE4_DELETE_CHILD ACE4_DELETE_CHILD
Operation(s) affected: Operation(s) affected:
REMOVE REMOVE
RENAME RENAME
Discussion: Discussion:
Permission to delete a file or directory within a directory. Permission to delete a file or directory within a directory.
See Section 6.2.1.3.2 for information on ACE4_DELETE and See Section 6.2.1.3.2 for information on ACE4_DELETE and
ACE4_DELETE_CHILD interact. ACE4_DELETE_CHILD interact.
ACE4_READ_ATTRIBUTES ACE4_READ_ATTRIBUTES
Operation(s) affected: Operation(s) affected:
GETATTR of file system object attributes GETATTR of file system object attributes
VERIFY VERIFY
NVERIFY NVERIFY
READDIR READDIR
Discussion: Discussion:
The ability to read basic attributes (non-ACLs) of a file. On The ability to read basic attributes (non-ACLs) of a file. On
a UNIX system, basic attributes can be thought of as the stat- a UNIX system, basic attributes can be thought of as the stat-
level attributes. Allowing this access mask bit would mean level attributes. Allowing this access mask bit would mean
that the entity can execute "ls -l" and stat. If a READDIR that the entity can execute "ls -l" and stat. If a READDIR
operation requests attributes, this mask must be allowed for operation requests attributes, this mask must be allowed for
the READDIR to succeed. the READDIR to succeed.
ACE4_WRITE_ATTRIBUTES ACE4_WRITE_ATTRIBUTES
Operation(s) affected: Operation(s) affected:
skipping to change at page 142, line 12 skipping to change at line 6825
trigger log or alarm events. Such ACEs only take effect once they trigger log or alarm events. Such ACEs only take effect once they
are applied (with this bit cleared) to newly created files and are applied (with this bit cleared) to newly created files and
directories as specified by the ACE4_FILE_INHERIT_ACE and directories as specified by the ACE4_FILE_INHERIT_ACE and
ACE4_DIRECTORY_INHERIT_ACE flags. ACE4_DIRECTORY_INHERIT_ACE flags.
If this flag is present on an ACE, but neither If this flag is present on an ACE, but neither
ACE4_DIRECTORY_INHERIT_ACE nor ACE4_FILE_INHERIT_ACE is present, ACE4_DIRECTORY_INHERIT_ACE nor ACE4_FILE_INHERIT_ACE is present,
then an operation attempting to set such an attribute SHOULD fail then an operation attempting to set such an attribute SHOULD fail
with NFS4ERR_ATTRNOTSUPP. with NFS4ERR_ATTRNOTSUPP.
ACE4_SUCCESSFUL_ACCESS_ACE_FLAG ACE4_SUCCESSFUL_ACCESS_ACE_FLAG and ACE4_FAILED_ACCESS_ACE_FLAG
ACE4_FAILED_ACCESS_ACE_FLAG
The ACE4_SUCCESSFUL_ACCESS_ACE_FLAG (SUCCESS) and The ACE4_SUCCESSFUL_ACCESS_ACE_FLAG (SUCCESS) and
ACE4_FAILED_ACCESS_ACE_FLAG (FAILED) flag bits may be set only on ACE4_FAILED_ACCESS_ACE_FLAG (FAILED) flag bits may be set only on
ACE4_SYSTEM_AUDIT_ACE_TYPE (AUDIT) and ACE4_SYSTEM_ALARM_ACE_TYPE ACE4_SYSTEM_AUDIT_ACE_TYPE (AUDIT) and ACE4_SYSTEM_ALARM_ACE_TYPE
(ALARM) ACE types. If during the processing of the file's ACL, (ALARM) ACE types. If during the processing of the file's ACL,
the server encounters an AUDIT or ALARM ACE that matches the the server encounters an AUDIT or ALARM ACE that matches the
principal attempting the OPEN, the server notes that fact, and the principal attempting the OPEN, the server notes that fact, and the
presence, if any, of the SUCCESS and FAILED flags encountered in presence, if any, of the SUCCESS and FAILED flags encountered in
the AUDIT or ALARM ACE. Once the server completes the ACL the AUDIT or ALARM ACE. Once the server completes the ACL
processing, it then notes if the operation succeeded or failed. processing, it then notes if the operation succeeded or failed.
If the operation succeeded, and if the SUCCESS flag was set for a If the operation succeeded, and if the SUCCESS flag was set for a
skipping to change at page 143, line 20 skipping to change at line 6878
which. which.
There are several special identifiers that need to be understood There are several special identifiers that need to be understood
universally, rather than in the context of a particular DNS domain. universally, rather than in the context of a particular DNS domain.
Some of these identifiers cannot be understood when an NFS client Some of these identifiers cannot be understood when an NFS client
accesses the server, but have meaning when a local process accesses accesses the server, but have meaning when a local process accesses
the file. The ability to display and modify these permissions is the file. The ability to display and modify these permissions is
permitted over NFS, even if none of the access methods on the server permitted over NFS, even if none of the access methods on the server
understands the identifiers. understands the identifiers.
+---------------+---------------------------------------------------+ +===============+==================================================+
| Who | Description | | Who | Description |
+---------------+---------------------------------------------------+ +===============+==================================================+
| OWNER | The owner of the file. | | OWNER | The owner of the file. |
| GROUP | The group associated with the file. | +---------------+--------------------------------------------------+
| EVERYONE | The world, including the owner and owning group. | | GROUP | The group associated with the file. |
| INTERACTIVE | Accessed from an interactive terminal. | +---------------+--------------------------------------------------+
| NETWORK | Accessed via the network. | | EVERYONE | The world, including the owner and owning group. |
| DIALUP | Accessed as a dialup user to the server. | +---------------+--------------------------------------------------+
| BATCH | Accessed from a batch job. | | INTERACTIVE | Accessed from an interactive terminal. |
| ANONYMOUS | Accessed without any authentication. | +---------------+--------------------------------------------------+
| AUTHENTICATED | Any authenticated user (opposite of ANONYMOUS). | | NETWORK | Accessed via the network. |
| SERVICE | Access from a system service. | +---------------+--------------------------------------------------+
+---------------+---------------------------------------------------+ | DIALUP | Accessed as a dialup user to the server. |
+---------------+--------------------------------------------------+
| BATCH | Accessed from a batch job. |
+---------------+--------------------------------------------------+
| ANONYMOUS | Accessed without any authentication. |
+---------------+--------------------------------------------------+
| AUTHENTICATED | Any authenticated user (opposite of ANONYMOUS). |
+---------------+--------------------------------------------------+
| SERVICE | Access from a system service. |
+---------------+--------------------------------------------------+
Table 4 Table 7
To avoid conflict, these special identifiers are distinguished by an To avoid conflict, these special identifiers are distinguished by an
appended "@" and should appear in the form "xxxx@" (with no domain appended "@" and should appear in the form "xxxx@" (with no domain
name after the "@"), for example, ANONYMOUS@. name after the "@"), for example, ANONYMOUS@.
The ACE4_IDENTIFIER_GROUP flag MUST be ignored on entries with these The ACE4_IDENTIFIER_GROUP flag MUST be ignored on entries with these
special identifiers. When encoding entries with these special special identifiers. When encoding entries with these special
identifiers, the ACE4_IDENTIFIER_GROUP flag SHOULD be set to zero. identifiers, the ACE4_IDENTIFIER_GROUP flag SHOULD be set to zero.
6.2.1.5.1. Discussion of EVERYONE@ 6.2.1.5.1. Discussion of EVERYONE@
skipping to change at page 145, line 49 skipping to change at line 7007
sections, especially Section 6.4. sections, especially Section 6.4.
6.3.1. Interpreting an ACL 6.3.1. Interpreting an ACL
6.3.1.1. Server Considerations 6.3.1.1. Server Considerations
The server uses the algorithm described in Section 6.2.1 to determine The server uses the algorithm described in Section 6.2.1 to determine
whether an ACL allows access to an object. However, the ACL might whether an ACL allows access to an object. However, the ACL might
not be the sole determiner of access. For example: not be the sole determiner of access. For example:
o In the case of a file system exported as read-only, the server may * In the case of a file system exported as read-only, the server may
deny write access even though an object's ACL grants it. deny write access even though an object's ACL grants it.
o Server implementations MAY grant ACE4_WRITE_ACL and ACE4_READ_ACL * Server implementations MAY grant ACE4_WRITE_ACL and ACE4_READ_ACL
permissions to prevent a situation from arising in which there is permissions to prevent a situation from arising in which there is
no valid way to ever modify the ACL. no valid way to ever modify the ACL.
o All servers will allow a user the ability to read the data of the * All servers will allow a user the ability to read the data of the
file when only the execute permission is granted (i.e., if the ACL file when only the execute permission is granted (i.e., if the ACL
denies the user the ACE4_READ_DATA access and allows the user denies the user the ACE4_READ_DATA access and allows the user
ACE4_EXECUTE, the server will allow the user to read the data of ACE4_EXECUTE, the server will allow the user to read the data of
the file). the file).
o Many servers have the notion of owner-override in which the owner * Many servers have the notion of owner-override in which the owner
of the object is allowed to override accesses that are denied by of the object is allowed to override accesses that are denied by
the ACL. This may be helpful, for example, to allow users the ACL. This may be helpful, for example, to allow users
continued access to open files on which the permissions have continued access to open files on which the permissions have
changed. changed.
o Many servers have the notion of a "superuser" that has privileges * Many servers have the notion of a "superuser" that has privileges
beyond an ordinary user. The superuser may be able to read or beyond an ordinary user. The superuser may be able to read or
write data or metadata in ways that would not be permitted by the write data or metadata in ways that would not be permitted by the
ACL. ACL.
o A retention attribute might also block access otherwise allowed by * A retention attribute might also block access otherwise allowed by
ACLs (see Section 5.13). ACLs (see Section 5.13).
6.3.1.2. Client Considerations 6.3.1.2. Client Considerations
Clients SHOULD NOT do their own access checks based on their Clients SHOULD NOT do their own access checks based on their
interpretation of the ACL, but rather use the OPEN and ACCESS interpretation of the ACL, but rather use the OPEN and ACCESS
operations to do access checks. This allows the client to act on the operations to do access checks. This allows the client to act on the
results of having the server determine whether or not access should results of having the server determine whether or not access should
be granted based on its interpretation of the ACL. be granted based on its interpretation of the ACL.
skipping to change at page 158, line 20 skipping to change at line 7597
clients should use strong security mechanisms to access the pseudo clients should use strong security mechanisms to access the pseudo
file system in order to prevent man-in-the-middle attacks. file system in order to prevent man-in-the-middle attacks.
8. State Management 8. State Management
Integrating locking into the NFS protocol necessarily causes it to be Integrating locking into the NFS protocol necessarily causes it to be
stateful. With the inclusion of such features as share reservations, stateful. With the inclusion of such features as share reservations,
file and directory delegations, recallable layouts, and support for file and directory delegations, recallable layouts, and support for
mandatory byte-range locking, the protocol becomes substantially more mandatory byte-range locking, the protocol becomes substantially more
dependent on proper management of state than the traditional dependent on proper management of state than the traditional
combination of NFS and NLM (Network Lock Manager) [53]. These combination of NFS and NLM (Network Lock Manager) [54]. These
features include expanded locking facilities, which provide some features include expanded locking facilities, which provide some
measure of inter-client exclusion, but the state also offers features measure of inter-client exclusion, but the state also offers features
not readily providable using a stateless model. There are three not readily providable using a stateless model. There are three
components to making this state manageable: components to making this state manageable:
o clear division between client and server * clear division between client and server
o ability to reliably detect inconsistency in state between client * ability to reliably detect inconsistency in state between client
and server and server
o simple and robust recovery mechanisms * simple and robust recovery mechanisms
In this model, the server owns the state information. The client In this model, the server owns the state information. The client
requests changes in locks and the server responds with the changes requests changes in locks and the server responds with the changes
made. Non-client-initiated changes in locking state are infrequent. made. Non-client-initiated changes in locking state are infrequent.
The client receives prompt notification of such changes and can The client receives prompt notification of such changes and can
adjust its view of the locking state to reflect the server's changes. adjust its view of the locking state to reflect the server's changes.
Individual pieces of state created by the server and passed to the Individual pieces of state created by the server and passed to the
client at its request are represented by 128-bit stateids. These client at its request are represented by 128-bit stateids. These
stateids may represent a particular open file, a set of byte-range stateids may represent a particular open file, a set of byte-range
skipping to change at page 160, line 14 skipping to change at line 7684
8.2.1. Stateid Types 8.2.1. Stateid Types
With the exception of special stateids (see Section 8.2.3), each With the exception of special stateids (see Section 8.2.3), each
stateid represents locking objects of one of a set of types defined stateid represents locking objects of one of a set of types defined
by the NFSv4.1 protocol. Note that in all these cases, where we by the NFSv4.1 protocol. Note that in all these cases, where we
speak of guarantee, it is understood there are situations such as a speak of guarantee, it is understood there are situations such as a
client restart, or lock revocation, that allow the guarantee to be client restart, or lock revocation, that allow the guarantee to be
voided. voided.
o Stateids may represent opens of files. * Stateids may represent opens of files.
Each stateid in this case represents the OPEN state for a given Each stateid in this case represents the OPEN state for a given
client ID/open-owner/filehandle triple. Such stateids are subject client ID/open-owner/filehandle triple. Such stateids are subject
to change (with consequent incrementing of the stateid's seqid) in to change (with consequent incrementing of the stateid's seqid) in
response to OPENs that result in upgrade and OPEN_DOWNGRADE response to OPENs that result in upgrade and OPEN_DOWNGRADE
operations. operations.
o Stateids may represent sets of byte-range locks. * Stateids may represent sets of byte-range locks.
All locks held on a particular file by a particular owner and All locks held on a particular file by a particular owner and
gotten under the aegis of a particular open file are associated gotten under the aegis of a particular open file are associated
with a single stateid with the seqid being incremented whenever with a single stateid with the seqid being incremented whenever
LOCK and LOCKU operations affect that set of locks. LOCK and LOCKU operations affect that set of locks.
o Stateids may represent file delegations, which are recallable * Stateids may represent file delegations, which are recallable
guarantees by the server to the client that other clients will not guarantees by the server to the client that other clients will not
reference or modify a particular file, until the delegation is reference or modify a particular file, until the delegation is
returned. In NFSv4.1, file delegations may be obtained on both returned. In NFSv4.1, file delegations may be obtained on both
regular and non-regular files. regular and non-regular files.
A stateid represents a single delegation held by a client for a A stateid represents a single delegation held by a client for a
particular filehandle. particular filehandle.
o Stateids may represent directory delegations, which are recallable * Stateids may represent directory delegations, which are recallable
guarantees by the server to the client that other clients will not guarantees by the server to the client that other clients will not
modify the directory, until the delegation is returned. modify the directory, until the delegation is returned.
A stateid represents a single delegation held by a client for a A stateid represents a single delegation held by a client for a
particular directory filehandle. particular directory filehandle.
o Stateids may represent layouts, which are recallable guarantees by * Stateids may represent layouts, which are recallable guarantees by
the server to the client that particular files may be accessed via the server to the client that particular files may be accessed via
an alternate data access protocol at specific locations. Such an alternate data access protocol at specific locations. Such
access is limited to particular sets of byte-ranges and may access is limited to particular sets of byte-ranges and may
proceed until those byte-ranges are reduced or the layout is proceed until those byte-ranges are reduced or the layout is
returned. returned.
A stateid represents the set of all layouts held by a particular A stateid represents the set of all layouts held by a particular
client for a particular filehandle with a given layout type. The client for a particular filehandle with a given layout type. The
seqid is updated as the layouts of that set of byte-ranges change, seqid is updated as the layouts of that set of byte-ranges change,
via layout stateid changing operations such as LAYOUTGET and via layout stateid changing operations such as LAYOUTGET and
skipping to change at page 162, line 43 skipping to change at line 7809
Stateid values whose "other" field is either all zeros or all ones Stateid values whose "other" field is either all zeros or all ones
are reserved. They may not be assigned by the server but have are reserved. They may not be assigned by the server but have
special meanings defined by the protocol. The particular meaning special meanings defined by the protocol. The particular meaning
depends on whether the "other" field is all zeros or all ones and the depends on whether the "other" field is all zeros or all ones and the
specific value of the "seqid" field. specific value of the "seqid" field.
The following combinations of "other" and "seqid" are defined in The following combinations of "other" and "seqid" are defined in
NFSv4.1: NFSv4.1:
o When "other" and "seqid" are both zero, the stateid is treated as * When "other" and "seqid" are both zero, the stateid is treated as
a special anonymous stateid, which can be used in READ, WRITE, and a special anonymous stateid, which can be used in READ, WRITE, and
SETATTR requests to indicate the absence of any OPEN state SETATTR requests to indicate the absence of any OPEN state
associated with the request. When an anonymous stateid value is associated with the request. When an anonymous stateid value is
used and an existing open denies the form of access requested, used and an existing open denies the form of access requested,
then access will be denied to the request. This stateid MUST NOT then access will be denied to the request. This stateid MUST NOT
be used on operations to data servers (Section 13.6). be used on operations to data servers (Section 13.6).
o When "other" and "seqid" are both all ones, the stateid is a * When "other" and "seqid" are both all ones, the stateid is a
special READ bypass stateid. When this value is used in WRITE or special READ bypass stateid. When this value is used in WRITE or
SETATTR, it is treated like the anonymous value. When used in SETATTR, it is treated like the anonymous value. When used in
READ, the server MAY grant access, even if access would normally READ, the server MAY grant access, even if access would normally
be denied to READ operations. This stateid MUST NOT be used on be denied to READ operations. This stateid MUST NOT be used on
operations to data servers. operations to data servers.
o When "other" is zero and "seqid" is one, the stateid represents * When "other" is zero and "seqid" is one, the stateid represents
the current stateid, which is whatever value is the last stateid the current stateid, which is whatever value is the last stateid
returned by an operation within the COMPOUND. In the case of an returned by an operation within the COMPOUND. In the case of an
OPEN, the stateid returned for the open file and not the OPEN, the stateid returned for the open file and not the
delegation is used. The stateid passed to the operation in place delegation is used. The stateid passed to the operation in place
of the special value has its "seqid" value set to zero, except of the special value has its "seqid" value set to zero, except
when the current stateid is used by the operation CLOSE or when the current stateid is used by the operation CLOSE or
OPEN_DOWNGRADE. If there is no operation in the COMPOUND that has OPEN_DOWNGRADE. If there is no operation in the COMPOUND that has
returned a stateid value, the server MUST return the error returned a stateid value, the server MUST return the error
NFS4ERR_BAD_STATEID. As illustrated in Figure 6, if the value of NFS4ERR_BAD_STATEID. As illustrated in Figure 6, if the value of
a current stateid is a special stateid and the stateid of an a current stateid is a special stateid and the stateid of an
operation's arguments has "other" set to zero and "seqid" set to operation's arguments has "other" set to zero and "seqid" set to
one, then the server MUST return the error NFS4ERR_BAD_STATEID. one, then the server MUST return the error NFS4ERR_BAD_STATEID.
o When "other" is zero and "seqid" is NFS4_UINT32_MAX, the stateid * When "other" is zero and "seqid" is NFS4_UINT32_MAX, the stateid
represents a reserved stateid value defined to be invalid. When represents a reserved stateid value defined to be invalid. When
this stateid is used, the server MUST return the error this stateid is used, the server MUST return the error
NFS4ERR_BAD_STATEID. NFS4ERR_BAD_STATEID.
If a stateid value is used that has all zeros or all ones in the If a stateid value is used that has all zeros or all ones in the
"other" field but does not match one of the cases above, the server "other" field but does not match one of the cases above, the server
MUST return the error NFS4ERR_BAD_STATEID. MUST return the error NFS4ERR_BAD_STATEID.
Special stateids, unlike other stateids, are not associated with Special stateids, unlike other stateids, are not associated with
individual client IDs or filehandles and can be used with all valid individual client IDs or filehandles and can be used with all valid
skipping to change at page 164, line 26 skipping to change at line 7887
An "other" value must never be reused for a different purpose (i.e., An "other" value must never be reused for a different purpose (i.e.,
different filehandle, owner, or type of locks) within the context of different filehandle, owner, or type of locks) within the context of
a single client ID. A server may retain the "other" value for the a single client ID. A server may retain the "other" value for the
same purpose beyond the point where it may otherwise be freed, but if same purpose beyond the point where it may otherwise be freed, but if
it does so, it must maintain "seqid" continuity with previous values. it does so, it must maintain "seqid" continuity with previous values.
One mechanism that may be used to satisfy the requirement that the One mechanism that may be used to satisfy the requirement that the
server recognize invalid and out-of-date stateids is for the server server recognize invalid and out-of-date stateids is for the server
to divide the "other" field of the stateid into two fields. to divide the "other" field of the stateid into two fields.
o an index into a table of locking-state structures. * an index into a table of locking-state structures.
o a generation number that is incremented on each allocation of a * a generation number that is incremented on each allocation of a
table entry for a particular use. table entry for a particular use.
And then store in each table entry, And then store in each table entry,
o the client ID with which the stateid is associated. * the client ID with which the stateid is associated.
o the current generation number for the (at most one) valid stateid * the current generation number for the (at most one) valid stateid
sharing this index value. sharing this index value.
o the filehandle of the file on which the locks are taken. * the filehandle of the file on which the locks are taken.
o an indication of the type of stateid (open, byte-range lock, file * an indication of the type of stateid (open, byte-range lock, file
delegation, directory delegation, layout). delegation, directory delegation, layout).
o the last "seqid" value returned corresponding to the current * the last "seqid" value returned corresponding to the current
"other" value. "other" value.
o an indication of the current status of the locks associated with * an indication of the current status of the locks associated with
this stateid, in particular, whether these have been revoked and this stateid, in particular, whether these have been revoked and
if so, for what reason. if so, for what reason.
With this information, an incoming stateid can be validated and the With this information, an incoming stateid can be validated and the
appropriate error returned when necessary. Special and non-special appropriate error returned when necessary. Special and non-special
stateids are handled separately. (See Section 8.2.3 for a discussion stateids are handled separately. (See Section 8.2.3 for a discussion
of special stateids.) of special stateids.)
Note that stateids are implicitly qualified by the current client ID, Note that stateids are implicitly qualified by the current client ID,
as derived from the client ID associated with the current session. as derived from the client ID associated with the current session.
skipping to change at page 165, line 28 skipping to change at line 7937
session and all leased state has been lost, then the session in session and all leased state has been lost, then the session in
question will, although valid, be marked as dead, and any operation question will, although valid, be marked as dead, and any operation
not satisfied by means of the reply cache will receive the error not satisfied by means of the reply cache will receive the error
NFS4ERR_DEADSESSION, and thus not be processed as indicated below. NFS4ERR_DEADSESSION, and thus not be processed as indicated below.
When a stateid is being tested and the "other" field is all zeros or When a stateid is being tested and the "other" field is all zeros or
all ones, a check that the "other" and "seqid" fields match a defined all ones, a check that the "other" and "seqid" fields match a defined
combination for a special stateid is done and the results determined combination for a special stateid is done and the results determined
as follows: as follows:
o If the "other" and "seqid" fields do not match a defined * If the "other" and "seqid" fields do not match a defined
combination associated with a special stateid, the error combination associated with a special stateid, the error
NFS4ERR_BAD_STATEID is returned. NFS4ERR_BAD_STATEID is returned.
o If the special stateid is one designating the current stateid and * If the special stateid is one designating the current stateid and
there is a current stateid, then the current stateid is there is a current stateid, then the current stateid is
substituted for the special stateid and the checks appropriate to substituted for the special stateid and the checks appropriate to
non-special stateids are performed. non-special stateids are performed.
o If the combination is valid in general but is not appropriate to * If the combination is valid in general but is not appropriate to
the context in which the stateid is used (e.g., an all-zero the context in which the stateid is used (e.g., an all-zero
stateid is used when an OPEN stateid is required in a LOCK stateid is used when an OPEN stateid is required in a LOCK
operation), the error NFS4ERR_BAD_STATEID is also returned. operation), the error NFS4ERR_BAD_STATEID is also returned.
o Otherwise, the check is completed and the special stateid is * Otherwise, the check is completed and the special stateid is
accepted as valid. accepted as valid.
When a stateid is being tested, and the "other" field is neither all When a stateid is being tested, and the "other" field is neither all
zeros nor all ones, the following procedure could be used to validate zeros nor all ones, the following procedure could be used to validate
an incoming stateid and return an appropriate error, when necessary, an incoming stateid and return an appropriate error, when necessary,
assuming that the "other" field would be divided into a table index assuming that the "other" field would be divided into a table index
and an entry generation. and an entry generation.
o If the table index field is outside the range of the associated * If the table index field is outside the range of the associated
table, return NFS4ERR_BAD_STATEID. table, return NFS4ERR_BAD_STATEID.
o If the selected table entry is of a different generation than that * If the selected table entry is of a different generation than that
specified in the incoming stateid, return NFS4ERR_BAD_STATEID. specified in the incoming stateid, return NFS4ERR_BAD_STATEID.
o If the selected table entry does not match the current filehandle, * If the selected table entry does not match the current filehandle,
return NFS4ERR_BAD_STATEID. return NFS4ERR_BAD_STATEID.
o If the client ID in the table entry does not match the client ID * If the client ID in the table entry does not match the client ID
associated with the current session, return NFS4ERR_BAD_STATEID. associated with the current session, return NFS4ERR_BAD_STATEID.
o If the stateid represents revoked state, then return * If the stateid represents revoked state, then return
NFS4ERR_EXPIRED, NFS4ERR_ADMIN_REVOKED, or NFS4ERR_DELEG_REVOKED, NFS4ERR_EXPIRED, NFS4ERR_ADMIN_REVOKED, or NFS4ERR_DELEG_REVOKED,
as appropriate. as appropriate.
o If the stateid type is not valid for the context in which the * If the stateid type is not valid for the context in which the
stateid appears, return NFS4ERR_BAD_STATEID. Note that a stateid stateid appears, return NFS4ERR_BAD_STATEID. Note that a stateid
may be valid in general, as would be reported by the TEST_STATEID may be valid in general, as would be reported by the TEST_STATEID
operation, but be invalid for a particular operation, as, for operation, but be invalid for a particular operation, as, for
example, when a stateid that doesn't represent byte-range locks is example, when a stateid that doesn't represent byte-range locks is
passed to the non-from_open case of LOCK or to LOCKU, or when a passed to the non-from_open case of LOCK or to LOCKU, or when a
stateid that does not represent an open is passed to CLOSE or stateid that does not represent an open is passed to CLOSE or
OPEN_DOWNGRADE. In such cases, the server MUST return OPEN_DOWNGRADE. In such cases, the server MUST return
NFS4ERR_BAD_STATEID. NFS4ERR_BAD_STATEID.
o If the "seqid" field is not zero and it is greater than the * If the "seqid" field is not zero and it is greater than the
current sequence value corresponding to the current "other" field, current sequence value corresponding to the current "other" field,
return NFS4ERR_BAD_STATEID. return NFS4ERR_BAD_STATEID.
o If the "seqid" field is not zero and it is less than the current * If the "seqid" field is not zero and it is less than the current
sequence value corresponding to the current "other" field, return sequence value corresponding to the current "other" field, return
NFS4ERR_OLD_STATEID. NFS4ERR_OLD_STATEID.
o Otherwise, the stateid is valid and the table entry should contain * Otherwise, the stateid is valid and the table entry should contain
any additional information about the type of stateid and any additional information about the type of stateid and
information associated with that particular type of stateid, such information associated with that particular type of stateid, such
as the associated set of locks, e.g., open-owner and lock-owner as the associated set of locks, e.g., open-owner and lock-owner
information, as well as information on the specific locks, e.g., information, as well as information on the specific locks, e.g.,
open modes and byte-ranges. open modes and byte-ranges.
8.2.5. Stateid Use for I/O Operations 8.2.5. Stateid Use for I/O Operations
Clients performing I/O operations need to select an appropriate Clients performing I/O operations need to select an appropriate
stateid based on the locks (including opens and delegations) held by stateid based on the locks (including opens and delegations) held by
skipping to change at page 167, line 12 skipping to change at line 8016
requests. SETATTR operations that change the file size are treated requests. SETATTR operations that change the file size are treated
like I/O operations in this regard. like I/O operations in this regard.
The following rules, applied in order of decreasing priority, govern The following rules, applied in order of decreasing priority, govern
the selection of the appropriate stateid. In following these rules, the selection of the appropriate stateid. In following these rules,
the client will only consider locks of which it has actually received the client will only consider locks of which it has actually received
notification by an appropriate operation response or callback. Note notification by an appropriate operation response or callback. Note
that the rules are slightly different in the case of I/O to data that the rules are slightly different in the case of I/O to data
servers when file layouts are being used (see Section 13.9.1). servers when file layouts are being used (see Section 13.9.1).
o If the client holds a delegation for the file in question, the * If the client holds a delegation for the file in question, the
delegation stateid SHOULD be used. delegation stateid SHOULD be used.
o Otherwise, if the entity corresponding to the lock-owner (e.g., a * Otherwise, if the entity corresponding to the lock-owner (e.g., a
process) sending the I/O has a byte-range lock stateid for the process) sending the I/O has a byte-range lock stateid for the
associated open file, then the byte-range lock stateid for that associated open file, then the byte-range lock stateid for that
lock-owner and open file SHOULD be used. lock-owner and open file SHOULD be used.
o If there is no byte-range lock stateid, then the OPEN stateid for * If there is no byte-range lock stateid, then the OPEN stateid for
the open file in question SHOULD be used. the open file in question SHOULD be used.
o Finally, if none of the above apply, then a special stateid SHOULD * Finally, if none of the above apply, then a special stateid SHOULD
be used. be used.
Ignoring these rules may result in situations in which the server Ignoring these rules may result in situations in which the server
does not have information necessary to properly process the request. does not have information necessary to properly process the request.
For example, when mandatory byte-range locks are in effect, if the For example, when mandatory byte-range locks are in effect, if the
stateid does not indicate the proper lock-owner, via a lock stateid, stateid does not indicate the proper lock-owner, via a lock stateid,
a request might be avoidably rejected. a request might be avoidably rejected.
The server however should not try to enforce these ordering rules and The server however should not try to enforce these ordering rules and
should use whatever information is available to properly process I/O should use whatever information is available to properly process I/O
skipping to change at page 168, line 43 skipping to change at line 8095
operation, the server MAY renew the lease; this depends on whether operation, the server MAY renew the lease; this depends on whether
any state was revoked as a result of the client's failure to renew any state was revoked as a result of the client's failure to renew
the lease before expiration. the lease before expiration.
Absent other activity that would renew the lease, a COMPOUND Absent other activity that would renew the lease, a COMPOUND
consisting of a single SEQUENCE operation will suffice. The client consisting of a single SEQUENCE operation will suffice. The client
should also take communication-related delays into account and take should also take communication-related delays into account and take
steps to ensure that the renewal messages actually reach the server steps to ensure that the renewal messages actually reach the server
in good time. For example: in good time. For example:
o When trunking is in effect, the client should consider sending * When trunking is in effect, the client should consider sending
multiple requests on different connections, in order to ensure multiple requests on different connections, in order to ensure
that renewal occurs, even in the event of blockage in the path that renewal occurs, even in the event of blockage in the path
used for one of those connections. used for one of those connections.
o Transport retransmission delays might become so large as to * Transport retransmission delays might become so large as to
approach or exceed the length of the lease period. This may be approach or exceed the length of the lease period. This may be
particularly likely when the server is unresponsive due to a particularly likely when the server is unresponsive due to a
restart; see Section 8.4.2.1. If the client implementation is not restart; see Section 8.4.2.1. If the client implementation is not
careful, transport retransmission delays can result in the client careful, transport retransmission delays can result in the client
failing to detect a server restart before the grace period ends. failing to detect a server restart before the grace period ends.
The scenario is that the client is using a transport with The scenario is that the client is using a transport with
exponential backoff, such that the maximum retransmission timeout exponential backoff, such that the maximum retransmission timeout
exceeds both the grace period and the lease_time attribute. A exceeds both the grace period and the lease_time attribute. A
network partition causes the client's connection's retransmission network partition causes the client's connection's retransmission
interval to back off, and even after the partition heals, the next interval to back off, and even after the partition heals, the next
skipping to change at page 169, line 46 skipping to change at line 8146
no active COMPOUND operations on any such sessions. no active COMPOUND operations on any such sessions.
Because the SEQUENCE operation is the basic mechanism to renew a Because the SEQUENCE operation is the basic mechanism to renew a
lease, and because it must be done at least once for each lease lease, and because it must be done at least once for each lease
period, it is the natural mechanism whereby the server will inform period, it is the natural mechanism whereby the server will inform
the client of changes in the lease status that the client needs to be the client of changes in the lease status that the client needs to be
informed of. The client should inspect the status flags informed of. The client should inspect the status flags
(sr_status_flags) returned by sequence and take the appropriate (sr_status_flags) returned by sequence and take the appropriate
action (see Section 18.46.3 for details). action (see Section 18.46.3 for details).
o The status bits SEQ4_STATUS_CB_PATH_DOWN and * The status bits SEQ4_STATUS_CB_PATH_DOWN and
SEQ4_STATUS_CB_PATH_DOWN_SESSION indicate problems with the SEQ4_STATUS_CB_PATH_DOWN_SESSION indicate problems with the
backchannel that the client may need to address in order to backchannel that the client may need to address in order to
receive callback requests. receive callback requests.
o The status bits SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING and * The status bits SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING and
SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED indicate problems with GSS SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED indicate problems with GSS
contexts or RPCSEC_GSS handles for the backchannel that the client contexts or RPCSEC_GSS handles for the backchannel that the client
might have to address in order to allow callback requests to be might have to address in order to allow callback requests to be
sent. sent.
o The status bits SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED, * The status bits SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED,
SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED, SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED,
SEQ4_STATUS_ADMIN_STATE_REVOKED, and SEQ4_STATUS_ADMIN_STATE_REVOKED, and
SEQ4_STATUS_RECALLABLE_STATE_REVOKED notify the client of lock SEQ4_STATUS_RECALLABLE_STATE_REVOKED notify the client of lock
revocation events. When these bits are set, the client should use revocation events. When these bits are set, the client should use
TEST_STATEID to find what stateids have been revoked and use TEST_STATEID to find what stateids have been revoked and use
FREE_STATEID to acknowledge loss of the associated state. FREE_STATEID to acknowledge loss of the associated state.
o The status bit SEQ4_STATUS_LEASE_MOVE indicates that * The status bit SEQ4_STATUS_LEASE_MOVE indicates that
responsibility for lease renewal has been transferred to one or responsibility for lease renewal has been transferred to one or
more new servers. more new servers.
o The status bit SEQ4_STATUS_RESTART_RECLAIM_NEEDED indicates that * The status bit SEQ4_STATUS_RESTART_RECLAIM_NEEDED indicates that
due to server restart the client must reclaim locking state. due to server restart the client must reclaim locking state.
o The status bit SEQ4_STATUS_BACKCHANNEL_FAULT indicates that the * The status bit SEQ4_STATUS_BACKCHANNEL_FAULT indicates that the
server has encountered an unrecoverable fault with the backchannel server has encountered an unrecoverable fault with the backchannel
(e.g., it has lost track of a sequence ID for a slot in the (e.g., it has lost track of a sequence ID for a slot in the
backchannel). backchannel).
8.4. Crash Recovery 8.4. Crash Recovery
A critical requirement in crash recovery is that both the client and A critical requirement in crash recovery is that both the client and
the server know when the other has failed. Additionally, it is the server know when the other has failed. Additionally, it is
required that a client sees a consistent view of data across server required that a client sees a consistent view of data across server
restarts. All READ and WRITE operations that may have been queued restarts. All READ and WRITE operations that may have been queued
within the client or network buffers must wait until the client has within the client or network buffers must wait until the client has
successfully recovered the locks protecting the READ and WRITE successfully recovered the locks protecting the READ and WRITE
operations. Any that reach the server before the server can safely operations. Any that reach the server before the server can safely
determine that the client has recovered enough locking state to be determine that the client has recovered enough locking state to be
sure that such operations can be safely processed must be rejected. sure that such operations can be safely processed must be rejected.
This will happen because either: This will happen because either:
o The state presented is no longer valid since it is associated with * The state presented is no longer valid since it is associated with
a now invalid client ID. In this case, the client will receive a now invalid client ID. In this case, the client will receive
either an NFS4ERR_BADSESSION or NFS4ERR_DEADSESSION error, and any either an NFS4ERR_BADSESSION or NFS4ERR_DEADSESSION error, and any
attempt to attach a new session to that invalid client ID will attempt to attach a new session to that invalid client ID will
result in an NFS4ERR_STALE_CLIENTID error. result in an NFS4ERR_STALE_CLIENTID error.
o Subsequent recovery of locks may make execution of the operation * Subsequent recovery of locks may make execution of the operation
inappropriate (NFS4ERR_GRACE). inappropriate (NFS4ERR_GRACE).
8.4.1. Client Failure and Recovery 8.4.1. Client Failure and Recovery
In the event that a client fails, the server may release the client's In the event that a client fails, the server may release the client's
locks when the associated lease has expired. Conflicting locks from locks when the associated lease has expired. Conflicting locks from
another client may only be granted after this lease expiration. As another client may only be granted after this lease expiration. As
discussed in Section 8.3, when a client has not failed and re- discussed in Section 8.3, when a client has not failed and re-
establishes its lease before expiration occurs, requests for establishes its lease before expiration occurs, requests for
conflicting locks will not be granted. conflicting locks will not be granted.
skipping to change at page 175, line 10 skipping to change at line 8394
requests to be processed during the grace period, it MUST determine requests to be processed during the grace period, it MUST determine
that no lock subsequently reclaimed will be rejected and that no lock that no lock subsequently reclaimed will be rejected and that no lock
subsequently reclaimed would have prevented any I/O operation subsequently reclaimed would have prevented any I/O operation
processed during the grace period. processed during the grace period.
Clients should be prepared for the return of NFS4ERR_GRACE errors for Clients should be prepared for the return of NFS4ERR_GRACE errors for
non-reclaim lock and I/O requests. In this case, the client should non-reclaim lock and I/O requests. In this case, the client should
employ a retry mechanism for the request. A delay (on the order of employ a retry mechanism for the request. A delay (on the order of
several seconds) between retries should be used to avoid overwhelming several seconds) between retries should be used to avoid overwhelming
the server. Further discussion of the general issue is included in the server. Further discussion of the general issue is included in
[54]. The client must account for the server that can perform I/O [55]. The client must account for the server that can perform I/O
and non-reclaim locking requests within the grace period as well as and non-reclaim locking requests within the grace period as well as
those that cannot do so. those that cannot do so.
A reclaim-type locking request outside the server's grace period can A reclaim-type locking request outside the server's grace period can
only succeed if the server can guarantee that no conflicting lock or only succeed if the server can guarantee that no conflicting lock or
I/O request has been granted since restart. I/O request has been granted since restart.
A server may, upon restart, establish a new value for the lease A server may, upon restart, establish a new value for the lease
period. Therefore, clients should, once a new client ID is period. Therefore, clients should, once a new client ID is
established, refetch the lease_time attribute and use it as the basis established, refetch the lease_time attribute and use it as the basis
skipping to change at page 175, line 35 skipping to change at line 8419
previous server instance to be reliably re-established. previous server instance to be reliably re-established.
The possibility exists that, because of server configuration events, The possibility exists that, because of server configuration events,
the client will be communicating with a server different than the one the client will be communicating with a server different than the one
on which the locks were obtained, as shown by the combination of on which the locks were obtained, as shown by the combination of
eir_server_scope and eir_server_owner. This leads to the issue of if eir_server_scope and eir_server_owner. This leads to the issue of if
and when the client should attempt to reclaim locks previously and when the client should attempt to reclaim locks previously
obtained on what is being reported as a different server. The rules obtained on what is being reported as a different server. The rules
to resolve this question are as follows: to resolve this question are as follows:
o If the server scope is different, the client should not attempt to * If the server scope is different, the client should not attempt to
reclaim locks. In this situation, no lock reclaim is possible. reclaim locks. In this situation, no lock reclaim is possible.
Any attempt to re-obtain the locks with non-reclaim operations is Any attempt to re-obtain the locks with non-reclaim operations is
problematic since there is no guarantee that the existing problematic since there is no guarantee that the existing
filehandles will be recognized by the new server, or that if filehandles will be recognized by the new server, or that if
recognized, they denote the same objects. It is best to treat the recognized, they denote the same objects. It is best to treat the
locks as having been revoked by the reconfiguration event. locks as having been revoked by the reconfiguration event.
o If the server scope is the same, the client should attempt to * If the server scope is the same, the client should attempt to
reclaim locks, even if the eir_server_owner value is different. reclaim locks, even if the eir_server_owner value is different.
In this situation, it is the responsibility of the server to In this situation, it is the responsibility of the server to
return NFS4ERR_NO_GRACE if it cannot provide correct support for return NFS4ERR_NO_GRACE if it cannot provide correct support for
lock reclaim operations, including the prevention of edge lock reclaim operations, including the prevention of edge
conditions. conditions.
The eir_server_owner field is not used in making this determination. The eir_server_owner field is not used in making this determination.
Its function is to specify trunking possibilities for the client (see Its function is to specify trunking possibilities for the client (see
Section 2.10.5) and not to control lock reclaim. Section 2.10.5) and not to control lock reclaim.
skipping to change at page 179, line 47 skipping to change at line 8624
inverse proportion to how harsh the server intends to be whenever inverse proportion to how harsh the server intends to be whenever
edge conditions arise. The server that is completely tolerant of all edge conditions arise. The server that is completely tolerant of all
edge conditions will record in stable storage every lock that is edge conditions will record in stable storage every lock that is
acquired, removing the lock record from stable storage only when the acquired, removing the lock record from stable storage only when the
lock is released. For the two edge conditions discussed above, the lock is released. For the two edge conditions discussed above, the
harshest a server can be, and still support a grace period for harshest a server can be, and still support a grace period for
reclaims, requires that the server record in stable storage some reclaims, requires that the server record in stable storage some
minimal information. For example, a server implementation could, for minimal information. For example, a server implementation could, for
each client, save in stable storage a record containing: each client, save in stable storage a record containing:
o the co_ownerid field from the client_owner4 presented in the * the co_ownerid field from the client_owner4 presented in the
EXCHANGE_ID operation. EXCHANGE_ID operation.
o a boolean that indicates if the client's lease expired or if there * a boolean that indicates if the client's lease expired or if there
was administrative intervention (see Section 8.5) to revoke a was administrative intervention (see Section 8.5) to revoke a
byte-range lock, share reservation, or delegation and there has byte-range lock, share reservation, or delegation and there has
been no acknowledgment, via FREE_STATEID, of such revocation. been no acknowledgment, via FREE_STATEID, of such revocation.
o a boolean that indicates whether the client may have locks that it * a boolean that indicates whether the client may have locks that it
believes to be reclaimable in situations in which the grace period believes to be reclaimable in situations in which the grace period
was terminated, making the server's view of lock reclaimability was terminated, making the server's view of lock reclaimability
suspect. The server will set this for any client record in stable suspect. The server will set this for any client record in stable
storage where the client has not done a suitable RECLAIM_COMPLETE storage where the client has not done a suitable RECLAIM_COMPLETE
(global or file system-specific depending on the target of the (global or file system-specific depending on the target of the
lock request) before it grants any new (i.e., not reclaimed) lock lock request) before it grants any new (i.e., not reclaimed) lock
to any client. to any client.
Assuming the above record keeping, for the first edge condition, Assuming the above record keeping, for the first edge condition,
after the server restarts, the record that client A's lease expired after the server restarts, the record that client A's lease expired
skipping to change at page 182, line 32 skipping to change at line 8753
within the lease period, it is up to the client to determine which within the lease period, it is up to the client to determine which
locks have been revoked and which have not. It does this by using locks have been revoked and which have not. It does this by using
the TEST_STATEID operation on the appropriate set of stateids. Once the TEST_STATEID operation on the appropriate set of stateids. Once
the set of revoked locks has been determined, the applications can be the set of revoked locks has been determined, the applications can be
notified, and the invalidated stateids can be freed and lock notified, and the invalidated stateids can be freed and lock
revocation acknowledged by using FREE_STATEID. revocation acknowledged by using FREE_STATEID.
8.6. Short and Long Leases 8.6. Short and Long Leases
When determining the time period for the server lease, the usual When determining the time period for the server lease, the usual
lease tradeoffs apply. A short lease is good for fast server lease trade-offs apply. A short lease is good for fast server
recovery at a cost of increased operations to effect lease renewal recovery at a cost of increased operations to effect lease renewal
(when there are no other operations during the period to effect lease (when there are no other operations during the period to effect lease
renewal as a side effect). A long lease is certainly kinder and renewal as a side effect). A long lease is certainly kinder and
gentler to servers trying to handle very large numbers of clients. gentler to servers trying to handle very large numbers of clients.
The number of extra requests to effect lock renewal drops in inverse The number of extra requests to effect lock renewal drops in inverse
proportion to the lease time. The disadvantages of a long lease proportion to the lease time. The disadvantages of a long lease
include the possibility of slower recovery after certain failures. include the possibility of slower recovery after certain failures.
After server failure, a longer grace period may be required when some After server failure, a longer grace period may be required when some
clients do not promptly reclaim their locks and do a global clients do not promptly reclaim their locks and do a global
RECLAIM_COMPLETE. In the event of client failure, the longer period RECLAIM_COMPLETE. In the event of client failure, the longer period
skipping to change at page 183, line 47 skipping to change at line 8813
There are a number of operations and fields within existing There are a number of operations and fields within existing
operations that no longer have a function in NFSv4.1. In one way or operations that no longer have a function in NFSv4.1. In one way or
another, these changes are all due to the implementation of sessions another, these changes are all due to the implementation of sessions
that provide client context and exactly once semantics as a base that provide client context and exactly once semantics as a base
feature of the protocol, separate from locking itself. feature of the protocol, separate from locking itself.
The following NFSv4.0 operations MUST NOT be implemented in NFSv4.1. The following NFSv4.0 operations MUST NOT be implemented in NFSv4.1.
The server MUST return NFS4ERR_NOTSUPP if these operations are found The server MUST return NFS4ERR_NOTSUPP if these operations are found
in an NFSv4.1 COMPOUND. in an NFSv4.1 COMPOUND.
o SETCLIENTID since its function has been replaced by EXCHANGE_ID. * SETCLIENTID since its function has been replaced by EXCHANGE_ID.
o SETCLIENTID_CONFIRM since client ID confirmation now happens by * SETCLIENTID_CONFIRM since client ID confirmation now happens by
means of CREATE_SESSION. means of CREATE_SESSION.
o OPEN_CONFIRM because state-owner-based seqids have been replaced * OPEN_CONFIRM because state-owner-based seqids have been replaced
by the sequence ID in the SEQUENCE operation. by the sequence ID in the SEQUENCE operation.
o RELEASE_LOCKOWNER because lock-owners with no associated locks do * RELEASE_LOCKOWNER because lock-owners with no associated locks do
not have any sequence-related state and so can be deleted by the not have any sequence-related state and so can be deleted by the
server at will. server at will.
o RENEW because every SEQUENCE operation for a session causes lease * RENEW because every SEQUENCE operation for a session causes lease
renewal, making a separate operation superfluous. renewal, making a separate operation superfluous.
Also, there are a number of fields, present in existing operations, Also, there are a number of fields, present in existing operations,
related to locking that have no use in minor version 1. They were related to locking that have no use in minor version 1. They were
used in minor version 0 to perform functions now provided in a used in minor version 0 to perform functions now provided in a
different fashion. different fashion.
o Sequence ids used to sequence requests for a given state-owner and * Sequence ids used to sequence requests for a given state-owner and
to provide retry protection, now provided via sessions. to provide retry protection, now provided via sessions.
o Client IDs used to identify the client associated with a given * Client IDs used to identify the client associated with a given
request. Client identification is now available using the client request. Client identification is now available using the client
ID associated with the current session, without needing an ID associated with the current session, without needing an
explicit client ID field. explicit client ID field.
Such vestigial fields in existing operations have no function in Such vestigial fields in existing operations have no function in
NFSv4.1 and are ignored by the server. Note that client IDs in NFSv4.1 and are ignored by the server. Note that client IDs in
operations new to NFSv4.1 (such as CREATE_SESSION and operations new to NFSv4.1 (such as CREATE_SESSION and
DESTROY_CLIENTID) are not ignored. DESTROY_CLIENTID) are not ignored.
9. File Locking and Share Reservations 9. File Locking and Share Reservations
skipping to change at page 187, line 7 skipping to change at line 8964
Thus, the LOCK operation does not need to distinguish between Thus, the LOCK operation does not need to distinguish between
advisory and mandatory byte-range locks. It is the server's advisory and mandatory byte-range locks. It is the server's
processing of the READ and WRITE operations that introduces the processing of the READ and WRITE operations that introduces the
distinction. distinction.
Every stateid that is validly passed to READ, WRITE, or SETATTR, with Every stateid that is validly passed to READ, WRITE, or SETATTR, with
the exception of special stateid values, defines an access mode for the exception of special stateid values, defines an access mode for
the file (i.e., OPEN4_SHARE_ACCESS_READ, OPEN4_SHARE_ACCESS_WRITE, or the file (i.e., OPEN4_SHARE_ACCESS_READ, OPEN4_SHARE_ACCESS_WRITE, or
OPEN4_SHARE_ACCESS_BOTH). OPEN4_SHARE_ACCESS_BOTH).
o For stateids associated with opens, this is the mode defined by * For stateids associated with opens, this is the mode defined by
the original OPEN that caused the allocation of the OPEN stateid the original OPEN that caused the allocation of the OPEN stateid
and as modified by subsequent OPENs and OPEN_DOWNGRADEs for the and as modified by subsequent OPENs and OPEN_DOWNGRADEs for the
same open-owner/file pair. same open-owner/file pair.
o For stateids returned by byte-range LOCK operations, the * For stateids returned by byte-range LOCK operations, the
appropriate mode is the access mode for the OPEN stateid appropriate mode is the access mode for the OPEN stateid
associated with the lock set represented by the stateid. associated with the lock set represented by the stateid.
o For delegation stateids, the access mode is based on the type of * For delegation stateids, the access mode is based on the type of
delegation. delegation.
When a READ, WRITE, or SETATTR (that specifies the size attribute) When a READ, WRITE, or SETATTR (that specifies the size attribute)
operation is done, the operation is subject to checking against the operation is done, the operation is subject to checking against the
access mode to verify that the operation is appropriate given the access mode to verify that the operation is appropriate given the
stateid with which the operation is associated. stateid with which the operation is associated.
In the case of WRITE-type operations (i.e., WRITEs and SETATTRs that In the case of WRITE-type operations (i.e., WRITEs and SETATTRs that
set size), the server MUST verify that the access mode allows writing set size), the server MUST verify that the access mode allows writing
and MUST return an NFS4ERR_OPENMODE error if it does not. In the and MUST return an NFS4ERR_OPENMODE error if it does not. In the
skipping to change at page 188, line 5 skipping to change at line 9010
this special stateid is used. However, WRITE operations with this this special stateid is used. However, WRITE operations with this
special stateid value MUST NOT bypass locking checks and are treated special stateid value MUST NOT bypass locking checks and are treated
exactly the same as if a special stateid for anonymous state were exactly the same as if a special stateid for anonymous state were
used. used.
A lock may not be granted while a READ or WRITE operation using one A lock may not be granted while a READ or WRITE operation using one
of the special stateids is being performed and the scope of the lock of the special stateids is being performed and the scope of the lock
to be granted would conflict with the READ or WRITE operation. This to be granted would conflict with the READ or WRITE operation. This
can occur when: can occur when:
o A mandatory byte-range lock is requested with a byte-range that * A mandatory byte-range lock is requested with a byte-range that
conflicts with the byte-range of the READ or WRITE operation. For conflicts with the byte-range of the READ or WRITE operation. For
the purposes of this paragraph, a conflict occurs when a shared the purposes of this paragraph, a conflict occurs when a shared
lock is requested and a WRITE operation is being performed, or an lock is requested and a WRITE operation is being performed, or an
exclusive lock is requested and either a READ or a WRITE operation exclusive lock is requested and either a READ or a WRITE operation
is being performed. is being performed.
o A share reservation is requested that denies reading and/or * A share reservation is requested that denies reading and/or
writing and the corresponding operation is being performed. writing and the corresponding operation is being performed.
o A delegation is to be granted and the delegation type would * A delegation is to be granted and the delegation type would
prevent the I/O operation, i.e., READ and WRITE conflict with an prevent the I/O operation, i.e., READ and WRITE conflict with an
OPEN_DELEGATE_WRITE delegation and WRITE conflicts with an OPEN_DELEGATE_WRITE delegation and WRITE conflicts with an
OPEN_DELEGATE_READ delegation. OPEN_DELEGATE_READ delegation.
When a client holds a delegation, it needs to ensure that the stateid When a client holds a delegation, it needs to ensure that the stateid
sent conveys the association of operation with the delegation, to sent conveys the association of operation with the delegation, to
avoid the delegation from being avoidably recalled. When the avoid the delegation from being avoidably recalled. When the
delegation stateid, a stateid open associated with that delegation, delegation stateid, a stateid open associated with that delegation,
or a stateid representing byte-range locks derived from such an open or a stateid representing byte-range locks derived from such an open
is used, the server knows that the READ, WRITE, or SETATTR does not is used, the server knows that the READ, WRITE, or SETATTR does not
skipping to change at page 194, line 44 skipping to change at line 9329
OPEN_DOWNGRADEs should generally be sent with a non-zero seqid in the OPEN_DOWNGRADEs should generally be sent with a non-zero seqid in the
stateid, to avoid the possibility that the status change associated stateid, to avoid the possibility that the status change associated
with an open upgrade is not inadvertently lost. with an open upgrade is not inadvertently lost.
9.11. Reclaim of Open and Byte-Range Locks 9.11. Reclaim of Open and Byte-Range Locks
Special forms of the LOCK and OPEN operations are provided when it is Special forms of the LOCK and OPEN operations are provided when it is
necessary to re-establish byte-range locks or opens after a server necessary to re-establish byte-range locks or opens after a server
failure. failure.
o To reclaim existing opens, an OPEN operation is performed using a * To reclaim existing opens, an OPEN operation is performed using a
CLAIM_PREVIOUS. Because the client, in this type of situation, CLAIM_PREVIOUS. Because the client, in this type of situation,
will have already opened the file and have the filehandle of the will have already opened the file and have the filehandle of the
target file, this operation requires that the current filehandle target file, this operation requires that the current filehandle
be the target file, rather than a directory, and no file name is be the target file, rather than a directory, and no file name is
specified. specified.
o To reclaim byte-range locks, a LOCK operation with the reclaim * To reclaim byte-range locks, a LOCK operation with the reclaim
parameter set to true is used. parameter set to true is used.
Reclaims of opens associated with delegations are discussed in Reclaims of opens associated with delegations are discussed in
Section 10.2.1. Section 10.2.1.
10. Client-Side Caching 10. Client-Side Caching
Client-side caching of data, of file attributes, and of file names is Client-side caching of data, of file attributes, and of file names is
essential to providing good performance with the NFS protocol. essential to providing good performance with the NFS protocol.
Providing distributed cache coherence is a difficult problem, and Providing distributed cache coherence is a difficult problem, and
skipping to change at page 196, line 19 skipping to change at line 9399
Sending LOCK and LOCKU operations as well as the READ and WRITE Sending LOCK and LOCKU operations as well as the READ and WRITE
operations necessary to make data caching consistent with the locking operations necessary to make data caching consistent with the locking
semantics (see Section 10.3.2) can severely limit performance. When semantics (see Section 10.3.2) can severely limit performance. When
locking is used to provide protection against infrequent conflicts, a locking is used to provide protection against infrequent conflicts, a
large penalty is incurred. This penalty may discourage the use of large penalty is incurred. This penalty may discourage the use of
byte-range locking by applications. byte-range locking by applications.
The NFSv4.1 protocol provides more aggressive caching strategies with The NFSv4.1 protocol provides more aggressive caching strategies with
the following design goals: the following design goals:
o Compatibility with a large range of server semantics. * Compatibility with a large range of server semantics.
o Providing the same caching benefits as previous versions of the * Providing the same caching benefits as previous versions of the
NFS protocol when unable to support the more aggressive model. NFS protocol when unable to support the more aggressive model.
o Requirements for aggressive caching are organized so that a large * Requirements for aggressive caching are organized so that a large
portion of the benefit can be obtained even when not all of the portion of the benefit can be obtained even when not all of the
requirements can be met. requirements can be met.
The appropriate requirements for the server are discussed in later The appropriate requirements for the server are discussed in later
sections in which specific forms of caching are covered (see sections in which specific forms of caching are covered (see
Section 10.4). Section 10.4).
10.2. Delegation and Callbacks 10.2. Delegation and Callbacks
Recallable delegation of server responsibilities for a file to a Recallable delegation of server responsibilities for a file to a
skipping to change at page 197, line 25 skipping to change at line 9452
they MUST always be prepared for OPENs, WANT_DELEGATIONs, and they MUST always be prepared for OPENs, WANT_DELEGATIONs, and
GET_DIR_DELEGATIONs to be processed without any delegations being GET_DIR_DELEGATIONs to be processed without any delegations being
granted. granted.
Unlike locks, an operation by a second client to a delegated file Unlike locks, an operation by a second client to a delegated file
will cause the server to recall a delegation through a callback. For will cause the server to recall a delegation through a callback. For
individual operations, we will describe, under IMPLEMENTATION, when individual operations, we will describe, under IMPLEMENTATION, when
such operations are required to effect a recall. A number of points such operations are required to effect a recall. A number of points
should be noted, however. should be noted, however.
o The server is free to recall a delegation whenever it feels it is * The server is free to recall a delegation whenever it feels it is
desirable and may do so even if no operations requiring recall are desirable and may do so even if no operations requiring recall are
being done. being done.
o Operations done outside the NFSv4.1 protocol, due to, for example, * Operations done outside the NFSv4.1 protocol, due to, for example,
access by other protocols, or by local access, also need to result access by other protocols, or by local access, also need to result
in delegation recall when they make analogous changes to file in delegation recall when they make analogous changes to file
system data. What is crucial is if the change would invalidate system data. What is crucial is if the change would invalidate
the guarantees provided by the delegation. When this is possible, the guarantees provided by the delegation. When this is possible,
the delegation needs to be recalled and MUST be returned or the delegation needs to be recalled and MUST be returned or
revoked before allowing the operation to proceed. revoked before allowing the operation to proceed.
o The semantics of the file system are crucial in defining when * The semantics of the file system are crucial in defining when
delegation recall is required. If a particular change within a delegation recall is required. If a particular change within a
specific implementation causes change to a file attribute, then specific implementation causes change to a file attribute, then
delegation recall is required, whether that operation has been delegation recall is required, whether that operation has been
specifically listed as requiring delegation recall. Again, what specifically listed as requiring delegation recall. Again, what
is critical is whether the guarantees provided by the delegation is critical is whether the guarantees provided by the delegation
are being invalidated. are being invalidated.
Despite those caveats, the implementation sections for a number of Despite those caveats, the implementation sections for a number of
operations describe situations in which delegation recall would be operations describe situations in which delegation recall would be
required under some common circumstances: required under some common circumstances:
o For GETATTR, see Section 18.7.4. * For GETATTR, see Section 18.7.4.
o For OPEN, see Section 18.16.4. * For OPEN, see Section 18.16.4.
o For READ, see Section 18.22.4. * For READ, see Section 18.22.4.
o For REMOVE, see Section 18.25.4. * For REMOVE, see Section 18.25.4.
o For RENAME, see Section 18.26.4. * For RENAME, see Section 18.26.4.
o For SETATTR, see Section 18.30.4. * For SETATTR, see Section 18.30.4.
o For WRITE, see Section 18.32.4. * For WRITE, see Section 18.32.4.
On recall, the client holding the delegation needs to flush modified On recall, the client holding the delegation needs to flush modified
state (such as modified data) to the server and return the state (such as modified data) to the server and return the
delegation. The conflicting request will not be acted on until the delegation. The conflicting request will not be acted on until the
recall is complete. The recall is considered complete when the recall is complete. The recall is considered complete when the
client returns the delegation or the server times its wait for the client returns the delegation or the server times its wait for the
delegation to be returned and revokes the delegation as a result of delegation to be returned and revokes the delegation as a result of
the timeout. In the interim, the server will either delay responding the timeout. In the interim, the server will either delay responding
to conflicting requests or respond to them with NFS4ERR_DELAY. to conflicting requests or respond to them with NFS4ERR_DELAY.
Following the resolution of the recall, the server has the Following the resolution of the recall, the server has the
skipping to change at page 198, line 52 skipping to change at line 9527
A client failure or a network partition can result in failure to A client failure or a network partition can result in failure to
respond to a recall callback. In this case, the server will revoke respond to a recall callback. In this case, the server will revoke
the delegation, which in turn will render useless any modified state the delegation, which in turn will render useless any modified state
still on the client. still on the client.
10.2.1. Delegation Recovery 10.2.1. Delegation Recovery
There are three situations that delegation recovery needs to deal There are three situations that delegation recovery needs to deal
with: with:
o client restart * client restart
o server restart
o network partition (full or backchannel-only) * server restart
* network partition (full or backchannel-only)
In the event the client restarts, the failure to renew the lease will In the event the client restarts, the failure to renew the lease will
result in the revocation of byte-range locks and share reservations. result in the revocation of byte-range locks and share reservations.
Delegations, however, may be treated a bit differently. Delegations, however, may be treated a bit differently.
There will be situations in which delegations will need to be re- There will be situations in which delegations will need to be re-
established after a client restarts. The reason for this is that the established after a client restarts. The reason for this is that the
client may have file data stored locally and this data was associated client may have file data stored locally and this data was associated
with the previously held delegations. The client will need to re- with the previously held delegations. The client will need to re-
establish the appropriate file state on the server. establish the appropriate file state on the server.
skipping to change at page 200, line 7 skipping to change at line 9580
difference. In the normal case, if the server decides that a difference. In the normal case, if the server decides that a
delegation should not be granted, it performs the requested action delegation should not be granted, it performs the requested action
(e.g., OPEN) without granting any delegation. For reclaim, the (e.g., OPEN) without granting any delegation. For reclaim, the
server grants the delegation but a special designation is applied so server grants the delegation but a special designation is applied so
that the client treats the delegation as having been granted but that the client treats the delegation as having been granted but
recalled by the server. Because of this, the client has the duty to recalled by the server. Because of this, the client has the duty to
write all modified state to the server and then return the write all modified state to the server and then return the
delegation. This process of handling delegation reclaim reconciles delegation. This process of handling delegation reclaim reconciles
three principles of the NFSv4.1 protocol: three principles of the NFSv4.1 protocol:
o Upon reclaim, a client reporting resources assigned to it by an * Upon reclaim, a client reporting resources assigned to it by an
earlier server instance must be granted those resources. earlier server instance must be granted those resources.
o The server has unquestionable authority to determine whether * The server has unquestionable authority to determine whether
delegations are to be granted and, once granted, whether they are delegations are to be granted and, once granted, whether they are
to be continued. to be continued.
o The use of callbacks should not be depended upon until the client * The use of callbacks should not be depended upon until the client
has proven its ability to receive them. has proven its ability to receive them.
When a client needs to reclaim a delegation and there is no When a client needs to reclaim a delegation and there is no
associated open, the client may use the CLAIM_PREVIOUS variant of the associated open, the client may use the CLAIM_PREVIOUS variant of the
WANT_DELEGATION operation. However, since the server is not required WANT_DELEGATION operation. However, since the server is not required
to support this operation, an alternative is to reclaim via a dummy to support this operation, an alternative is to reclaim via a dummy
OPEN together with the delegation using an OPEN of type OPEN together with the delegation using an OPEN of type
CLAIM_PREVIOUS. The dummy open file can be released using a CLOSE to CLAIM_PREVIOUS. The dummy open file can be released using a CLOSE to
re-establish the original state to be reclaimed, a delegation without re-establish the original state to be reclaimed, a delegation without
an associated open. an associated open.
skipping to change at page 201, line 36 skipping to change at line 9657
In order to avoid invalidating the sharing assumptions on which In order to avoid invalidating the sharing assumptions on which
applications rely, NFSv4.1 clients should not provide cached data to applications rely, NFSv4.1 clients should not provide cached data to
applications or modify it on behalf of an application when it would applications or modify it on behalf of an application when it would
not be valid to obtain or modify that same data via a READ or WRITE not be valid to obtain or modify that same data via a READ or WRITE
operation. operation.
Furthermore, in the absence of an OPEN delegation (see Section 10.4), Furthermore, in the absence of an OPEN delegation (see Section 10.4),
two additional rules apply. Note that these rules are obeyed in two additional rules apply. Note that these rules are obeyed in
practice by many NFSv3 clients. practice by many NFSv3 clients.
o First, cached data present on a client must be revalidated after * First, cached data present on a client must be revalidated after
doing an OPEN. Revalidating means that the client fetches the doing an OPEN. Revalidating means that the client fetches the
change attribute from the server, compares it with the cached change attribute from the server, compares it with the cached
change attribute, and if different, declares the cached data (as change attribute, and if different, declares the cached data (as
well as the cached attributes) as invalid. This is to ensure that well as the cached attributes) as invalid. This is to ensure that
the data for the OPENed file is still correctly reflected in the the data for the OPENed file is still correctly reflected in the
client's cache. This validation must be done at least when the client's cache. This validation must be done at least when the
client's OPEN operation includes a deny of OPEN4_SHARE_DENY_WRITE client's OPEN operation includes a deny of OPEN4_SHARE_DENY_WRITE
or OPEN4_SHARE_DENY_BOTH, thus terminating a period in which other or OPEN4_SHARE_DENY_BOTH, thus terminating a period in which other
clients may have had the opportunity to open the file with clients may have had the opportunity to open the file with
OPEN4_SHARE_ACCESS_WRITE/OPEN4_SHARE_ACCESS_BOTH access. Clients OPEN4_SHARE_ACCESS_WRITE/OPEN4_SHARE_ACCESS_BOTH access. Clients
skipping to change at page 202, line 18 skipping to change at line 9686
cached data, so that metadata changes do not spuriously invalidate cached data, so that metadata changes do not spuriously invalidate
clean data. The implementor is cautioned in this approach. The clean data. The implementor is cautioned in this approach. The
change attribute is guaranteed to change for each update to the change attribute is guaranteed to change for each update to the
file, whereas time_modify is guaranteed to change only at the file, whereas time_modify is guaranteed to change only at the
granularity of the time_delta attribute. Use by the client's data granularity of the time_delta attribute. Use by the client's data
cache validation logic of time_modify and not change runs the risk cache validation logic of time_modify and not change runs the risk
of the client incorrectly marking stale data as valid. Thus, any of the client incorrectly marking stale data as valid. Thus, any
cache validation approach by the client MUST include the use of cache validation approach by the client MUST include the use of
the change attribute. the change attribute.
o Second, modified data must be flushed to the server before closing * Second, modified data must be flushed to the server before closing
a file OPENed for OPEN4_SHARE_ACCESS_WRITE. This is complementary a file OPENed for OPEN4_SHARE_ACCESS_WRITE. This is complementary
to the first rule. If the data is not flushed at CLOSE, the to the first rule. If the data is not flushed at CLOSE, the
revalidation done after the client OPENs a file is unable to revalidation done after the client OPENs a file is unable to
achieve its purpose. The other aspect to flushing the data before achieve its purpose. The other aspect to flushing the data before
close is that the data must be committed to stable storage, at the close is that the data must be committed to stable storage, at the
server, before the CLOSE operation is requested by the client. In server, before the CLOSE operation is requested by the client. In
the case of a server restart and a CLOSEd file, it may not be the case of a server restart and a CLOSEd file, it may not be
possible to retransmit the data to be written to the file, hence, possible to retransmit the data to be written to the file, hence,
this requirement. this requirement.
skipping to change at page 202, line 51 skipping to change at line 9719
the file would represent the right to perform READ and WRITE the file would represent the right to perform READ and WRITE
operations on the first byte-range. A WRITE_LT lock on byte one of operations on the first byte-range. A WRITE_LT lock on byte one of
the file would represent the right to perform READ and WRITE the file would represent the right to perform READ and WRITE
operations on the second byte-range. As long as all applications operations on the second byte-range. As long as all applications
manipulating the file obey this convention, they will work on a local manipulating the file obey this convention, they will work on a local
file system. However, they may not work with the NFSv4.1 protocol file system. However, they may not work with the NFSv4.1 protocol
unless clients refrain from data caching. unless clients refrain from data caching.
The rules for data caching in the byte-range locking environment are: The rules for data caching in the byte-range locking environment are:
o First, when a client obtains a byte-range lock for a particular * First, when a client obtains a byte-range lock for a particular
byte-range, the data cache corresponding to that byte-range (if byte-range, the data cache corresponding to that byte-range (if
any cache data exists) must be revalidated. If the change any cache data exists) must be revalidated. If the change
attribute indicates that the file may have been updated since the attribute indicates that the file may have been updated since the
cached data was obtained, the client must flush or invalidate the cached data was obtained, the client must flush or invalidate the
cached data for the newly locked byte-range. A client might cached data for the newly locked byte-range. A client might
choose to invalidate all of the non-modified cached data that it choose to invalidate all of the non-modified cached data that it
has for the file, but the only requirement for correct operation has for the file, but the only requirement for correct operation
is to invalidate all of the data in the newly locked byte-range. is to invalidate all of the data in the newly locked byte-range.
o Second, before releasing a WRITE_LT lock for a byte-range, all * Second, before releasing a WRITE_LT lock for a byte-range, all
modified data for that byte-range must be flushed to the server. modified data for that byte-range must be flushed to the server.
The modified data must also be written to stable storage. The modified data must also be written to stable storage.
Note that flushing data to the server and the invalidation of cached Note that flushing data to the server and the invalidation of cached
data must reflect the actual byte-ranges locked or unlocked. data must reflect the actual byte-ranges locked or unlocked.
Rounding these up or down to reflect client cache block boundaries Rounding these up or down to reflect client cache block boundaries
will cause problems if not carefully done. For example, writing a will cause problems if not carefully done. For example, writing a
modified block when only half of that block is within an area being modified block when only half of that block is within an area being
unlocked may cause invalid modification to the byte-range outside the unlocked may cause invalid modification to the byte-range outside the
unlocked area. This, in turn, may be part of a byte-range locked by unlocked area. This, in turn, may be part of a byte-range locked by
skipping to change at page 205, line 12 skipping to change at line 9825
with the NFSv3 protocol. Without this method, caching with the NFSv3 protocol. Without this method, caching
inconsistencies within the same client could occur, and this has not inconsistencies within the same client could occur, and this has not
been present in previous versions of the NFS protocol. Note that it been present in previous versions of the NFS protocol. Note that it
is possible to have such inconsistencies with applications executing is possible to have such inconsistencies with applications executing
on multiple clients, but that is not the issue being addressed here. on multiple clients, but that is not the issue being addressed here.
For the purposes of data caching, the following steps allow an For the purposes of data caching, the following steps allow an
NFSv4.1 client to determine whether two distinct filehandles denote NFSv4.1 client to determine whether two distinct filehandles denote
the same server-side object: the same server-side object:
o If GETATTR directed to two filehandles returns different values of * If GETATTR directed to two filehandles returns different values of
the fsid attribute, then the filehandles represent distinct the fsid attribute, then the filehandles represent distinct
objects. objects.
o If GETATTR for any file with an fsid that matches the fsid of the * If GETATTR for any file with an fsid that matches the fsid of the
two filehandles in question returns a unique_handles attribute two filehandles in question returns a unique_handles attribute
with a value of TRUE, then the two objects are distinct. with a value of TRUE, then the two objects are distinct.
o If GETATTR directed to the two filehandles does not return the * If GETATTR directed to the two filehandles does not return the
fileid attribute for both of the handles, then it cannot be fileid attribute for both of the handles, then it cannot be
determined whether the two objects are the same. Therefore, determined whether the two objects are the same. Therefore,
operations that depend on that knowledge (e.g., client-side data operations that depend on that knowledge (e.g., client-side data
caching) cannot be done reliably. Note that if GETATTR does not caching) cannot be done reliably. Note that if GETATTR does not
return the fileid attribute for both filehandles, it will return return the fileid attribute for both filehandles, it will return
it for neither of the filehandles, since the fsid for both it for neither of the filehandles, since the fsid for both
filehandles is the same. filehandles is the same.
o If GETATTR directed to the two filehandles returns different * If GETATTR directed to the two filehandles returns different
values for the fileid attribute, then they are distinct objects. values for the fileid attribute, then they are distinct objects.
o Otherwise, they are the same object. * Otherwise, they are the same object.
10.4. Open Delegation 10.4. Open Delegation
When a file is being OPENed, the server may delegate further handling When a file is being OPENed, the server may delegate further handling
of opens and closes for that file to the opening client. Any such of opens and closes for that file to the opening client. Any such
delegation is recallable since the circumstances that allowed for the delegation is recallable since the circumstances that allowed for the
delegation are subject to change. In particular, if the server delegation are subject to change. In particular, if the server
receives a conflicting OPEN from another client, the server must receives a conflicting OPEN from another client, the server must
recall the delegation before deciding whether the OPEN from the other recall the delegation before deciding whether the OPEN from the other
client may be granted. Making a delegation is up to the server, and client may be granted. Making a delegation is up to the server, and
clients should not assume that any particular OPEN either will or clients should not assume that any particular OPEN either will or
will not result in an OPEN delegation. The following is a typical will not result in an OPEN delegation. The following is a typical
set of conditions that servers might use in deciding whether an OPEN set of conditions that servers might use in deciding whether an OPEN
should be delegated: should be delegated:
o The client must be able to respond to the server's callback * The client must be able to respond to the server's callback
requests. If a backchannel has been established, the server will requests. If a backchannel has been established, the server will
send a CB_COMPOUND request, containing a single operation, send a CB_COMPOUND request, containing a single operation,
CB_SEQUENCE, for a test of backchannel availability. CB_SEQUENCE, for a test of backchannel availability.
o The client must have responded properly to previous recalls. * The client must have responded properly to previous recalls.
o There must be no current OPEN conflicting with the requested * There must be no current OPEN conflicting with the requested
delegation. delegation.
o There should be no current delegation that conflicts with the * There should be no current delegation that conflicts with the
delegation being requested. delegation being requested.
o The probability of future conflicting open requests should be low * The probability of future conflicting open requests should be low
based on the recent history of the file. based on the recent history of the file.
o The existence of any server-specific semantics of OPEN/CLOSE that * The existence of any server-specific semantics of OPEN/CLOSE that
would make the required handling incompatible with the prescribed would make the required handling incompatible with the prescribed
handling that the delegated client would apply (see below). handling that the delegated client would apply (see below).
There are two types of OPEN delegations: OPEN_DELEGATE_READ and There are two types of OPEN delegations: OPEN_DELEGATE_READ and
OPEN_DELEGATE_WRITE. An OPEN_DELEGATE_READ delegation allows a OPEN_DELEGATE_WRITE. An OPEN_DELEGATE_READ delegation allows a
client to handle, on its own, requests to open a file for reading client to handle, on its own, requests to open a file for reading
that do not deny OPEN4_SHARE_ACCESS_READ access to others. Multiple that do not deny OPEN4_SHARE_ACCESS_READ access to others. Multiple
OPEN_DELEGATE_READ delegations may be outstanding simultaneously and OPEN_DELEGATE_READ delegations may be outstanding simultaneously and
do not conflict. An OPEN_DELEGATE_WRITE delegation allows the client do not conflict. An OPEN_DELEGATE_WRITE delegation allows the client
to handle, on its own, all opens. Only OPEN_DELEGATE_WRITE to handle, on its own, all opens. Only one OPEN_DELEGATE_WRITE
delegation may exist for a given file at a given time, and it is delegation may exist for a given file at a given time, and it is
inconsistent with any OPEN_DELEGATE_READ delegations. inconsistent with any OPEN_DELEGATE_READ delegations.
When a client has an OPEN_DELEGATE_READ delegation, it is assured When a client has an OPEN_DELEGATE_READ delegation, it is assured
that neither the contents, the attributes (with the exception of that neither the contents, the attributes (with the exception of
time_access), nor the names of any links to the file will change time_access), nor the names of any links to the file will change
without its knowledge, so long as the delegation is held. When a without its knowledge, so long as the delegation is held. When a
client has an OPEN_DELEGATE_WRITE delegation, it may modify the file client has an OPEN_DELEGATE_WRITE delegation, it may modify the file
data locally since no other client will be accessing the file's data. data locally since no other client will be accessing the file's data.
The client holding an OPEN_DELEGATE_WRITE delegation may only locally The client holding an OPEN_DELEGATE_WRITE delegation may only locally
skipping to change at page 206, line 51 skipping to change at line 9912
When a client has an OPEN delegation, it does not need to send OPENs When a client has an OPEN delegation, it does not need to send OPENs
or CLOSEs to the server. Instead, the client may update the or CLOSEs to the server. Instead, the client may update the
appropriate status internally. For an OPEN_DELEGATE_READ delegation, appropriate status internally. For an OPEN_DELEGATE_READ delegation,
opens that cannot be handled locally (opens that are for opens that cannot be handled locally (opens that are for
OPEN4_SHARE_ACCESS_WRITE/OPEN4_SHARE_ACCESS_BOTH or that deny OPEN4_SHARE_ACCESS_WRITE/OPEN4_SHARE_ACCESS_BOTH or that deny
OPEN4_SHARE_ACCESS_READ access) must be sent to the server. OPEN4_SHARE_ACCESS_READ access) must be sent to the server.
When an OPEN delegation is made, the reply to the OPEN contains an When an OPEN delegation is made, the reply to the OPEN contains an
OPEN delegation structure that specifies the following: OPEN delegation structure that specifies the following:
o the type of delegation (OPEN_DELEGATE_READ or * the type of delegation (OPEN_DELEGATE_READ or
OPEN_DELEGATE_WRITE). OPEN_DELEGATE_WRITE).
o space limitation information to control flushing of data on close * space limitation information to control flushing of data on close
(OPEN_DELEGATE_WRITE delegation only; see Section 10.4.1) (OPEN_DELEGATE_WRITE delegation only; see Section 10.4.1)
o an nfsace4 specifying read and write permissions * an nfsace4 specifying read and write permissions
o a stateid to represent the delegation * a stateid to represent the delegation
The delegation stateid is separate and distinct from the stateid for The delegation stateid is separate and distinct from the stateid for
the OPEN proper. The standard stateid, unlike the delegation the OPEN proper. The standard stateid, unlike the delegation
stateid, is associated with a particular lock-owner and will continue stateid, is associated with a particular lock-owner and will continue
to be valid after the delegation is recalled and the file remains to be valid after the delegation is recalled and the file remains
open. open.
When a request internal to the client is made to open a file and an When a request internal to the client is made to open a file and an
OPEN delegation is in effect, it will be accepted or rejected solely OPEN delegation is in effect, it will be accepted or rejected solely
on the basis of the following conditions. Any requirement for other on the basis of the following conditions. Any requirement for other
checks to be made by the delegate should result in the OPEN checks to be made by the delegate should result in the OPEN
delegation being denied so that the checks can be made by the server delegation being denied so that the checks can be made by the server
itself. itself.
o The access and deny bits for the request and the file as described * The access and deny bits for the request and the file as described
in Section 9.7. in Section 9.7.
o The read and write permissions as determined below. * The read and write permissions as determined below.
The nfsace4 passed with delegation can be used to avoid frequent The nfsace4 passed with delegation can be used to avoid frequent
ACCESS calls. The permission check should be as follows: ACCESS calls. The permission check should be as follows:
o If the nfsace4 indicates that the open may be done, then it should * If the nfsace4 indicates that the open may be done, then it should
be granted without reference to the server. be granted without reference to the server.
o If the nfsace4 indicates that the open may not be done, then an * If the nfsace4 indicates that the open may not be done, then an
ACCESS request must be sent to the server to obtain the definitive ACCESS request must be sent to the server to obtain the definitive
answer. answer.
The server may return an nfsace4 that is more restrictive than the The server may return an nfsace4 that is more restrictive than the
actual ACL of the file. This includes an nfsace4 that specifies actual ACL of the file. This includes an nfsace4 that specifies
denial of all access. Note that some common practices such as denial of all access. Note that some common practices such as
mapping the traditional user "root" to the user "nobody" (see mapping the traditional user "root" to the user "nobody" (see
Section 5.9) may make it incorrect to return the actual ACL of the Section 5.9) may make it incorrect to return the actual ACL of the
file in the delegation response. file in the delegation response.
skipping to change at page 210, line 27 skipping to change at line 10082
Since the form of the change attribute is determined by the server Since the form of the change attribute is determined by the server
and is opaque to the client, the client and server need to agree on a and is opaque to the client, the client and server need to agree on a
method of communicating the modified state of the file. For the size method of communicating the modified state of the file. For the size
attribute, the client will report its current view of the file size. attribute, the client will report its current view of the file size.
For the change attribute, the handling is more involved. For the change attribute, the handling is more involved.
For the client, the following steps will be taken when receiving an For the client, the following steps will be taken when receiving an
OPEN_DELEGATE_WRITE delegation: OPEN_DELEGATE_WRITE delegation:
o The value of the change attribute will be obtained from the server * The value of the change attribute will be obtained from the server
and cached. Let this value be represented by c. and cached. Let this value be represented by c.
o The client will create a value greater than c that will be used * The client will create a value greater than c that will be used
for communicating that modified data is held at the client. Let for communicating that modified data is held at the client. Let
this value be represented by d. this value be represented by d.
o When the client is queried via CB_GETATTR for the change * When the client is queried via CB_GETATTR for the change
attribute, it checks to see if it holds modified data. If the attribute, it checks to see if it holds modified data. If the
file is modified, the value d is returned for the change attribute file is modified, the value d is returned for the change attribute
value. If this file is not currently modified, the client returns value. If this file is not currently modified, the client returns
the value c for the change attribute. the value c for the change attribute.
For simplicity of implementation, the client MAY for each CB_GETATTR For simplicity of implementation, the client MAY for each CB_GETATTR
return the same value d. This is true even if, between successive return the same value d. This is true even if, between successive
CB_GETATTR operations, the client again modifies the file's data or CB_GETATTR operations, the client again modifies the file's data or
metadata in its cache. The client can return the same value because metadata in its cache. The client can return the same value because
the only requirement is that the client be able to indicate to the the only requirement is that the client be able to indicate to the
skipping to change at page 211, line 14 skipping to change at line 10117
of the client's changes to that integer. Therefore, the server MUST of the client's changes to that integer. Therefore, the server MUST
encode the change attribute in network order when sending it to the encode the change attribute in network order when sending it to the
client. The client MUST decode it from network order to its native client. The client MUST decode it from network order to its native
order when receiving it, and the client MUST encode it in network order when receiving it, and the client MUST encode it in network
order when sending it to the server. For this reason, change is order when sending it to the server. For this reason, change is
defined as an unsigned integer rather than an opaque array of bytes. defined as an unsigned integer rather than an opaque array of bytes.
For the server, the following steps will be taken when providing an For the server, the following steps will be taken when providing an
OPEN_DELEGATE_WRITE delegation: OPEN_DELEGATE_WRITE delegation:
o Upon providing an OPEN_DELEGATE_WRITE delegation, the server will * Upon providing an OPEN_DELEGATE_WRITE delegation, the server will
cache a copy of the change attribute in the data structure it uses cache a copy of the change attribute in the data structure it uses
to record the delegation. Let this value be represented by sc. to record the delegation. Let this value be represented by sc.
o When a second client sends a GETATTR operation on the same file to * When a second client sends a GETATTR operation on the same file to
the server, the server obtains the change attribute from the first the server, the server obtains the change attribute from the first
client. Let this value be cc. client. Let this value be cc.
o If the value cc is equal to sc, the file is not modified and the * If the value cc is equal to sc, the file is not modified and the
server returns the current values for change, time_metadata, and server returns the current values for change, time_metadata, and
time_modify (for example) to the second client. time_modify (for example) to the second client.
o If the value cc is NOT equal to sc, the file is currently modified * If the value cc is NOT equal to sc, the file is currently modified
at the first client and most likely will be modified at the server at the first client and most likely will be modified at the server
at a future time. The server then uses its current time to at a future time. The server then uses its current time to
construct attribute values for time_metadata and time_modify. A construct attribute values for time_metadata and time_modify. A
new value of sc, which we will call nsc, is computed by the new value of sc, which we will call nsc, is computed by the
server, such that nsc >= sc + 1. The server then returns the server, such that nsc >= sc + 1. The server then returns the
constructed time_metadata, time_modify, and nsc values to the constructed time_metadata, time_modify, and nsc values to the
requester. The server replaces sc in the delegation record with requester. The server replaces sc in the delegation record with
nsc. To prevent the possibility of time_modify, time_metadata, nsc. To prevent the possibility of time_modify, time_metadata,
and change from appearing to go backward (which would happen if and change from appearing to go backward (which would happen if
the client holding the delegation fails to write its modified data the client holding the delegation fails to write its modified data
skipping to change at page 212, line 47 skipping to change at line 10198
down. down.
It should be noted that the server is under no obligation to use It should be noted that the server is under no obligation to use
CB_GETATTR, and therefore the server MAY simply recall the delegation CB_GETATTR, and therefore the server MAY simply recall the delegation
to avoid its use. to avoid its use.
10.4.4. Recall of Open Delegation 10.4.4. Recall of Open Delegation
The following events necessitate recall of an OPEN delegation: The following events necessitate recall of an OPEN delegation:
o potentially conflicting OPEN request (or a READ or WRITE operation * potentially conflicting OPEN request (or a READ or WRITE operation
done with a special stateid) done with a special stateid)
o SETATTR sent by another client * SETATTR sent by another client
o REMOVE request for the file
o RENAME request for the file as either the source or target of the * REMOVE request for the file
* RENAME request for the file as either the source or target of the
RENAME RENAME
Whether a RENAME of a directory in the path leading to the file Whether a RENAME of a directory in the path leading to the file
results in recall of an OPEN delegation depends on the semantics of results in recall of an OPEN delegation depends on the semantics of
the server's file system. If that file system denies such RENAMEs the server's file system. If that file system denies such RENAMEs
when a file is open, the recall must be performed to determine when a file is open, the recall must be performed to determine
whether the file in question is, in fact, open. whether the file in question is, in fact, open.
In addition to the situations above, the server may choose to recall In addition to the situations above, the server may choose to recall
OPEN delegations at any time if resource constraints make it OPEN delegations at any time if resource constraints make it
advisable to do so. Clients should always be prepared for the advisable to do so. Clients should always be prepared for the
possibility of recall. possibility of recall.
When a client receives a recall for an OPEN delegation, it needs to When a client receives a recall for an OPEN delegation, it needs to
update state on the server before returning the delegation. These update state on the server before returning the delegation. These
same updates must be done whenever a client chooses to return a same updates must be done whenever a client chooses to return a
delegation voluntarily. The following items of state need to be delegation voluntarily. The following items of state need to be
dealt with: dealt with:
o If the file associated with the delegation is no longer open and * If the file associated with the delegation is no longer open and
no previous CLOSE operation has been sent to the server, a CLOSE no previous CLOSE operation has been sent to the server, a CLOSE
operation must be sent to the server. operation must be sent to the server.
o If a file has other open references at the client, then OPEN * If a file has other open references at the client, then OPEN
operations must be sent to the server. The appropriate stateids operations must be sent to the server. The appropriate stateids
will be provided by the server for subsequent use by the client will be provided by the server for subsequent use by the client
since the delegation stateid will no longer be valid. These OPEN since the delegation stateid will no longer be valid. These OPEN
requests are done with the claim type of CLAIM_DELEGATE_CUR. This requests are done with the claim type of CLAIM_DELEGATE_CUR. This
will allow the presentation of the delegation stateid so that the will allow the presentation of the delegation stateid so that the
client can establish the appropriate rights to perform the OPEN. client can establish the appropriate rights to perform the OPEN.
(see Section 18.16, which describes the OPEN operation, for (See Section 18.16, which describes the OPEN operation, for
details.) details.)
o If there are granted byte-range locks, the corresponding LOCK * If there are granted byte-range locks, the corresponding LOCK
operations need to be performed. This applies to the operations need to be performed. This applies to the
OPEN_DELEGATE_WRITE delegation case only. OPEN_DELEGATE_WRITE delegation case only.
o For an OPEN_DELEGATE_WRITE delegation, if at the time of recall * For an OPEN_DELEGATE_WRITE delegation, if at the time of recall
the file is not open for OPEN4_SHARE_ACCESS_WRITE/ the file is not open for OPEN4_SHARE_ACCESS_WRITE/
OPEN4_SHARE_ACCESS_BOTH, all modified data for the file must be OPEN4_SHARE_ACCESS_BOTH, all modified data for the file must be
flushed to the server. If the delegation had not existed, the flushed to the server. If the delegation had not existed, the
client would have done this data flush before the CLOSE operation. client would have done this data flush before the CLOSE operation.
o For an OPEN_DELEGATE_WRITE delegation when a file is still open at * For an OPEN_DELEGATE_WRITE delegation when a file is still open at
the time of recall, any modified data for the file needs to be the time of recall, any modified data for the file needs to be
flushed to the server. flushed to the server.
o With the OPEN_DELEGATE_WRITE delegation in place, it is possible * With the OPEN_DELEGATE_WRITE delegation in place, it is possible
that the file was truncated during the duration of the delegation. that the file was truncated during the duration of the delegation.
For example, the truncation could have occurred as a result of an For example, the truncation could have occurred as a result of an
OPEN UNCHECKED with a size attribute value of zero. Therefore, if OPEN UNCHECKED with a size attribute value of zero. Therefore, if
a truncation of the file has occurred and this operation has not a truncation of the file has occurred and this operation has not
been propagated to the server, the truncation must occur before been propagated to the server, the truncation must occur before
any modified data is written to the server. any modified data is written to the server.
In the case of OPEN_DELEGATE_WRITE delegation, byte-range locking In the case of OPEN_DELEGATE_WRITE delegation, byte-range locking
imposes some additional requirements. To precisely maintain the imposes some additional requirements. To precisely maintain the
associated invariant, it is required to flush any modified data in associated invariant, it is required to flush any modified data in
skipping to change at page 218, line 46 skipping to change at line 10483
Changes made in one order on the server may be seen in a different Changes made in one order on the server may be seen in a different
order on one client and in a third order on another client. order on one client and in a third order on another client.
The typical file system application programming interfaces do not The typical file system application programming interfaces do not
provide means to atomically modify or interrogate attributes for provide means to atomically modify or interrogate attributes for
multiple files at the same time. The following rules provide an multiple files at the same time. The following rules provide an
environment where the potential incoherencies mentioned above can be environment where the potential incoherencies mentioned above can be
reasonably managed. These rules are derived from the practice of reasonably managed. These rules are derived from the practice of
previous NFS protocols. previous NFS protocols.
o All attributes for a given file (per-fsid attributes excepted) are * All attributes for a given file (per-fsid attributes excepted) are
cached as a unit at the client so that no non-serializability can cached as a unit at the client so that no non-serializability can
arise within the context of a single file. arise within the context of a single file.
o An upper time boundary is maintained on how long a client cache * An upper time boundary is maintained on how long a client cache
entry can be kept without being refreshed from the ser