draft-ietf-nfsv4-rfc3010bis-02.txt   draft-ietf-nfsv4-rfc3010bis-03.txt 
NFS version 4 Working Group S. Shepler NFS version 4 Working Group S. Shepler
INTERNET-DRAFT Sun Microsystems, Inc. INTERNET-DRAFT Sun Microsystems, Inc.
Document: draft-ietf-nfsv4-rfc3010bis-02.txt C. Beame Obsoletes: 3010 C. Beame
Hummingbird Ltd. Document: draft-ietf-nfsv4-rfc3010bis-03.txt Hummingbird Ltd.
B. Callaghan B. Callaghan
Sun Microsystems, Inc. Sun Microsystems, Inc.
M. Eisler M. Eisler
Network Appliance, Inc. Network Appliance, Inc.
D. Noveck D. Noveck
Network Appliance, Inc. Network Appliance, Inc.
D. Robinson D. Robinson
Sun Microsystems, Inc. Sun Microsystems, Inc.
R. Thurlow R. Thurlow
Sun Microsystems, Inc. Sun Microsystems, Inc.
August 2002 September 2002
NFS version 4 Protocol NFS version 4 Protocol
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026. all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 2, line 5 skipping to change at page 2, line 5
http://www.ietf.org/ietf/1id-abstracts.txt http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
Abstract Abstract
NFS version 4 is a distributed filesystem protocol which owes NFS version 4 is a distributed filesystem protocol which owes
heritage to NFS protocol versions 2 [RFC1094] and 3 [RFC1813]. heritage to NFS protocol versions 2 [RFC1094] and 3 [RFC1813].
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
Unlike earlier versions, the NFS version 4 protocol supports Unlike earlier versions, the NFS version 4 protocol supports
traditional file access while integrating support for file locking traditional file access while integrating support for file locking
and the mount protocol. In addition, support for strong security and the mount protocol. In addition, support for strong security
(and its negotiation), compound operations, client caching, and (and its negotiation), compound operations, client caching, and
internationalization have been added. Of course, attention has been internationalization have been added. Of course, attention has been
applied to making NFS version 4 operate well in an Internet applied to making NFS version 4 operate well in an Internet
environment. environment.
Copyright Copyright
Copyright (C) The Internet Society (2000-2002). All Rights Reserved. Copyright (C) The Internet Society (2000-2002). All Rights Reserved.
Key Words Key Words
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119]. document are to be interpreted as described in [RFC2119].
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 7 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1. Inconsistencies of this Document with Section 18 . . . . . 7 1.1. Inconsistencies of this Document with Section 18 . . . . . 7
1.2. Overview of NFS version 4 Features . . . . . . . . . . . . 8 1.2. Overview of NFS version 4 Features . . . . . . . . . . . . 8
1.2.1. RPC and Security . . . . . . . . . . . . . . . . . . . . 8 1.2.1. RPC and Security . . . . . . . . . . . . . . . . . . . . 8
1.2.2. Procedure and Operation Structure . . . . . . . . . . . 8 1.2.2. Procedure and Operation Structure . . . . . . . . . . . 8
1.2.3. Filesystem Model . . . . . . . . . . . . . . . . . . . . 9 1.2.3. Filesystem Model . . . . . . . . . . . . . . . . . . . . 9
1.2.3.1. Filehandle Types . . . . . . . . . . . . . . . . . . . 9 1.2.3.1. Filehandle Types . . . . . . . . . . . . . . . . . . . 9
skipping to change at page 3, line 34 skipping to change at page 3, line 34
2.2. Structured Data Types . . . . . . . . . . . . . . . . . 15 2.2. Structured Data Types . . . . . . . . . . . . . . . . . 15
3. RPC and Security Flavor . . . . . . . . . . . . . . . . . 21 3. RPC and Security Flavor . . . . . . . . . . . . . . . . . 21
3.1. Ports and Transports . . . . . . . . . . . . . . . . . . 21 3.1. Ports and Transports . . . . . . . . . . . . . . . . . . 21
3.1.1. Client Retransmission Behavior . . . . . . . . . . . . 21 3.1.1. Client Retransmission Behavior . . . . . . . . . . . . 21
3.2. Security Flavors . . . . . . . . . . . . . . . . . . . . 22 3.2. Security Flavors . . . . . . . . . . . . . . . . . . . . 22
3.2.1. Security mechanisms for NFS version 4 . . . . . . . . 22 3.2.1. Security mechanisms for NFS version 4 . . . . . . . . 22
3.2.1.1. Kerberos V5 as a security triple . . . . . . . . . . 22 3.2.1.1. Kerberos V5 as a security triple . . . . . . . . . . 22
3.2.1.2. LIPKEY as a security triple . . . . . . . . . . . . 23 3.2.1.2. LIPKEY as a security triple . . . . . . . . . . . . 23
3.2.1.3. SPKM-3 as a security triple . . . . . . . . . . . . 24 3.2.1.3. SPKM-3 as a security triple . . . . . . . . . . . . 24
3.3. Security Negotiation . . . . . . . . . . . . . . . . . . 24 3.3. Security Negotiation . . . . . . . . . . . . . . . . . . 24
3.3.1. SECINFO . . . . . . . . . . . . . . . . . . . . . . . 25 3.3.1. SECINFO . . . . . . . . . . . . . . . . . . . . . . . 24
3.3.2. Security Error . . . . . . . . . . . . . . . . . . . . 25 3.3.2. Security Error . . . . . . . . . . . . . . . . . . . . 25
3.4. Callback RPC Authentication . . . . . . . . . . . . . . 25 3.4. Callback RPC Authentication . . . . . . . . . . . . . . 25
4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . 28 4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . 27
4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 28 4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 27
4.1.1. Root Filehandle . . . . . . . . . . . . . . . . . . . 28 4.1.1. Root Filehandle . . . . . . . . . . . . . . . . . . . 27
4.1.2. Public Filehandle . . . . . . . . . . . . . . . . . . 28 4.1.2. Public Filehandle . . . . . . . . . . . . . . . . . . 27
4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 29 4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 28
4.2.1. General Properties of a Filehandle . . . . . . . . . . 29 4.2.1. General Properties of a Filehandle . . . . . . . . . . 28
4.2.2. Persistent Filehandle . . . . . . . . . . . . . . . . 30 4.2.2. Persistent Filehandle . . . . . . . . . . . . . . . . 29
4.2.3. Volatile Filehandle . . . . . . . . . . . . . . . . . 30 4.2.3. Volatile Filehandle . . . . . . . . . . . . . . . . . 29
4.2.4. One Method of Constructing a Volatile Filehandle . . . 31 4.2.4. One Method of Constructing a Volatile Filehandle . . . 30
4.3. Client Recovery from Filehandle Expiration . . . . . . . 32 4.3. Client Recovery from Filehandle Expiration . . . . . . . 31
5. File Attributes . . . . . . . . . . . . . . . . . . . . . 34 5. File Attributes . . . . . . . . . . . . . . . . . . . . . 33
5.1. Mandatory Attributes . . . . . . . . . . . . . . . . . . 35 5.1. Mandatory Attributes . . . . . . . . . . . . . . . . . . 34
5.2. Recommended Attributes . . . . . . . . . . . . . . . . . 35 5.2. Recommended Attributes . . . . . . . . . . . . . . . . . 34
5.3. Named Attributes . . . . . . . . . . . . . . . . . . . . 35 5.3. Named Attributes . . . . . . . . . . . . . . . . . . . . 34
5.4. Classification of Attributes . . . . . . . . . . . . . . 36 5.4. Classification of Attributes . . . . . . . . . . . . . . 35
5.5. Mandatory Attributes - Definitions . . . . . . . . . . . 38 5.5. Mandatory Attributes - Definitions . . . . . . . . . . . 37
5.6. Recommended Attributes - Definitions . . . . . . . . . . 40 5.6. Recommended Attributes - Definitions . . . . . . . . . . 39
5.7. Time Access . . . . . . . . . . . . . . . . . . . . . . 45 5.7. Time Access . . . . . . . . . . . . . . . . . . . . . . 44
5.8. Interpreting owner and owner_group . . . . . . . . . . . 45 5.8. Interpreting owner and owner_group . . . . . . . . . . . 44
5.9. Character Case Attributes . . . . . . . . . . . . . . . 47 5.9. Character Case Attributes . . . . . . . . . . . . . . . 46
5.10. Quota Attributes . . . . . . . . . . . . . . . . . . . 47 5.10. Quota Attributes . . . . . . . . . . . . . . . . . . . 46
5.11. Access Control Lists . . . . . . . . . . . . . . . . . 48 5.11. Access Control Lists . . . . . . . . . . . . . . . . . 47
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
5.11.1. ACE type . . . . . . . . . . . . . . . . . . . . . . 49 5.11.1. ACE type . . . . . . . . . . . . . . . . . . . . . . 48
5.11.2. ACE Access Mask . . . . . . . . . . . . . . . . . . . 50 5.11.2. ACE Access Mask . . . . . . . . . . . . . . . . . . . 49
5.11.3. ACE flag . . . . . . . . . . . . . . . . . . . . . . 52 5.11.3. ACE flag . . . . . . . . . . . . . . . . . . . . . . 51
5.11.4. ACE who . . . . . . . . . . . . . . . . . . . . . . . 53 5.11.4. ACE who . . . . . . . . . . . . . . . . . . . . . . . 52
5.11.5. Mode Attribute . . . . . . . . . . . . . . . . . . . 54 5.11.5. Mode Attribute . . . . . . . . . . . . . . . . . . . 53
5.11.6. Mode and ACL Attribute . . . . . . . . . . . . . . . 55 5.11.6. Mode and ACL Attribute . . . . . . . . . . . . . . . 54
5.11.7. mounted_on_fileid . . . . . . . . . . . . . . . . . . 55 5.11.7. mounted_on_fileid . . . . . . . . . . . . . . . . . . 54
6. Filesystem Migration and Replication . . . . . . . . . . . 57 6. Filesystem Migration and Replication . . . . . . . . . . . 56
6.1. Replication . . . . . . . . . . . . . . . . . . . . . . 57 6.1. Replication . . . . . . . . . . . . . . . . . . . . . . 56
6.2. Migration . . . . . . . . . . . . . . . . . . . . . . . 57 6.2. Migration . . . . . . . . . . . . . . . . . . . . . . . 56
6.3. Interpretation of the fs_locations Attribute . . . . . . 58 6.3. Interpretation of the fs_locations Attribute . . . . . . 57
6.4. Filehandle Recovery for Migration or Replication . . . . 59 6.4. Filehandle Recovery for Migration or Replication . . . . 58
7. NFS Server Name Space . . . . . . . . . . . . . . . . . . 60 7. NFS Server Name Space . . . . . . . . . . . . . . . . . . 59
7.1. Server Exports . . . . . . . . . . . . . . . . . . . . . 60 7.1. Server Exports . . . . . . . . . . . . . . . . . . . . . 59
7.2. Browsing Exports . . . . . . . . . . . . . . . . . . . . 60 7.2. Browsing Exports . . . . . . . . . . . . . . . . . . . . 59
7.3. Server Pseudo Filesystem . . . . . . . . . . . . . . . . 60 7.3. Server Pseudo Filesystem . . . . . . . . . . . . . . . . 59
7.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . 61 7.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . 60
7.5. Filehandle Volatility . . . . . . . . . . . . . . . . . 61 7.5. Filehandle Volatility . . . . . . . . . . . . . . . . . 60
7.6. Exported Root . . . . . . . . . . . . . . . . . . . . . 61 7.6. Exported Root . . . . . . . . . . . . . . . . . . . . . 60
7.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . 62 7.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . 61
7.8. Security Policy and Name Space Presentation . . . . . . 62 7.8. Security Policy and Name Space Presentation . . . . . . 61
8. File Locking and Share Reservations . . . . . . . . . . . 64 8. File Locking and Share Reservations . . . . . . . . . . . 63
8.1. Locking . . . . . . . . . . . . . . . . . . . . . . . . 64 8.1. Locking . . . . . . . . . . . . . . . . . . . . . . . . 63
8.1.1. Client ID . . . . . . . . . . . . . . . . . . . . . . 64 8.1.1. Client ID . . . . . . . . . . . . . . . . . . . . . . 63
8.1.2. Server Release of Clientid . . . . . . . . . . . . . . 67 8.1.2. Server Release of Clientid . . . . . . . . . . . . . . 66
8.1.3. lock_owner and stateid Definition . . . . . . . . . . 68 8.1.3. lock_owner and stateid Definition . . . . . . . . . . 67
8.1.4. Use of the stateid and Locking . . . . . . . . . . . . 69 8.1.4. Use of the stateid and Locking . . . . . . . . . . . . 68
8.1.5. Sequencing of Lock Requests . . . . . . . . . . . . . 71 8.1.5. Sequencing of Lock Requests . . . . . . . . . . . . . 70
8.1.6. Recovery from Replayed Requests . . . . . . . . . . . 72 8.1.6. Recovery from Replayed Requests . . . . . . . . . . . 71
8.1.7. Releasing lock_owner State . . . . . . . . . . . . . . 72 8.1.7. Releasing lock_owner State . . . . . . . . . . . . . . 72
8.1.8. Use of Open Confirmation . . . . . . . . . . . . . . . 73 8.1.8. Use of Open Confirmation . . . . . . . . . . . . . . . 72
8.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . 74 8.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . 73
8.3. Upgrading and Downgrading Locks . . . . . . . . . . . . 74 8.3. Upgrading and Downgrading Locks . . . . . . . . . . . . 73
8.4. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 75 8.4. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 74
8.5. Lease Renewal . . . . . . . . . . . . . . . . . . . . . 75 8.5. Lease Renewal . . . . . . . . . . . . . . . . . . . . . 74
8.6. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 76 8.6. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 75
8.6.1. Client Failure and Recovery . . . . . . . . . . . . . 76 8.6.1. Client Failure and Recovery . . . . . . . . . . . . . 76
8.6.2. Server Failure and Recovery . . . . . . . . . . . . . 77 8.6.2. Server Failure and Recovery . . . . . . . . . . . . . 76
8.6.3. Network Partitions and Recovery . . . . . . . . . . . 79 8.6.3. Network Partitions and Recovery . . . . . . . . . . . 78
8.7. Recovery from a Lock Request Timeout or Abort . . . . . 80 8.7. Recovery from a Lock Request Timeout or Abort . . . . . 81
8.8. Server Revocation of Locks . . . . . . . . . . . . . . . 80 8.8. Server Revocation of Locks . . . . . . . . . . . . . . . 82
8.9. Share Reservations . . . . . . . . . . . . . . . . . . . 81 8.9. Share Reservations . . . . . . . . . . . . . . . . . . . 83
8.10. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . 82 8.10. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . 83
8.10.1. Close and Retention of State Information . . . . . . 83 8.10.1. Close and Retention of State Information . . . . . . 84
8.11. Open Upgrade and Downgrade . . . . . . . . . . . . . . 83 8.11. Open Upgrade and Downgrade . . . . . . . . . . . . . . 85
8.12. Short and Long Leases . . . . . . . . . . . . . . . . . 84 8.12. Short and Long Leases . . . . . . . . . . . . . . . . . 85
8.13. Clocks, Propagation Delay, and Calculating Lease 8.13. Clocks, Propagation Delay, and Calculating Lease
Expiration . . . . . . . . . . . . . . . . . . . . . . 84 Expiration . . . . . . . . . . . . . . . . . . . . . . 86
8.14. Migration, Replication and State . . . . . . . . . . . 85 8.14. Migration, Replication and State . . . . . . . . . . . 86
8.14.1. Migration and State . . . . . . . . . . . . . . . . . 85 8.14.1. Migration and State . . . . . . . . . . . . . . . . . 87
8.14.2. Replication and State . . . . . . . . . . . . . . . . 86 8.14.2. Replication and State . . . . . . . . . . . . . . . . 87
8.14.3. Notification of Migrated Lease . . . . . . . . . . . 86 8.14.3. Notification of Migrated Lease . . . . . . . . . . . 88
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
8.14.4. Migration and the Lease_time Attribute . . . . . . . 87 8.14.4. Migration and the Lease_time Attribute . . . . . . . 88
9. Client-Side Caching . . . . . . . . . . . . . . . . . . . 88 9. Client-Side Caching . . . . . . . . . . . . . . . . . . . 90
9.1. Performance Challenges for Client-Side Caching . . . . . 88 9.1. Performance Challenges for Client-Side Caching . . . . . 90
9.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 89 9.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 91
9.2.1. Delegation Recovery . . . . . . . . . . . . . . . . . 90 9.2.1. Delegation Recovery . . . . . . . . . . . . . . . . . 92
9.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 92 9.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 94
9.3.1. Data Caching and OPENs . . . . . . . . . . . . . . . . 92 9.3.1. Data Caching and OPENs . . . . . . . . . . . . . . . . 94
9.3.2. Data Caching and File Locking . . . . . . . . . . . . 93 9.3.2. Data Caching and File Locking . . . . . . . . . . . . 95
9.3.3. Data Caching and Mandatory File Locking . . . . . . . 95 9.3.3. Data Caching and Mandatory File Locking . . . . . . . 97
9.3.4. Data Caching and File Identity . . . . . . . . . . . . 95 9.3.4. Data Caching and File Identity . . . . . . . . . . . . 97
9.4. Open Delegation . . . . . . . . . . . . . . . . . . . . 96 9.4. Open Delegation . . . . . . . . . . . . . . . . . . . . 98
9.4.1. Open Delegation and Data Caching . . . . . . . . . . . 99 9.4.1. Open Delegation and Data Caching . . . . . . . . . . . 101
9.4.2. Open Delegation and File Locks . . . . . . . . . . . . 100 9.4.2. Open Delegation and File Locks . . . . . . . . . . . . 102
9.4.3. Handling of CB_GETATTR . . . . . . . . . . . . . . . . 100 9.4.3. Handling of CB_GETATTR . . . . . . . . . . . . . . . . 102
9.4.4. Recall of Open Delegation . . . . . . . . . . . . . . 102 9.4.4. Recall of Open Delegation . . . . . . . . . . . . . . 105
9.4.5. Delegation Revocation . . . . . . . . . . . . . . . . 104 9.4.5. Clients that Fail to Honor Delegation Recalls . . . . 107
9.5. Data Caching and Revocation . . . . . . . . . . . . . . 104 9.4.6. Delegation Revocation . . . . . . . . . . . . . . . . 107
9.5.1. Revocation Recovery for Write Open Delegation . . . . 104 9.5. Data Caching and Revocation . . . . . . . . . . . . . . 108
9.6. Attribute Caching . . . . . . . . . . . . . . . . . . . 105 9.5.1. Revocation Recovery for Write Open Delegation . . . . 108
9.7. Name Caching . . . . . . . . . . . . . . . . . . . . . . 107 9.6. Attribute Caching . . . . . . . . . . . . . . . . . . . 109
9.8. Directory Caching . . . . . . . . . . . . . . . . . . . 108 9.7. Data and Metadata Caching and Memory Mapped Files . . . 111
10. Minor Versioning . . . . . . . . . . . . . . . . . . . . 110 9.8. Name Caching . . . . . . . . . . . . . . . . . . . . . . 113
11. Internationalization . . . . . . . . . . . . . . . . . . 113 9.9. Directory Caching . . . . . . . . . . . . . . . . . . . 114
11.1. Universal Versus Local Character Sets . . . . . . . . . 113 10. Minor Versioning . . . . . . . . . . . . . . . . . . . . 116
11.2. Overview of Universal Character Set Standards . . . . . 114 11. Internationalization . . . . . . . . . . . . . . . . . . 119
11.3. Difficulties with UCS-4, UCS-2, Unicode . . . . . . . . 115 11.1. Universal Versus Local Character Sets . . . . . . . . . 119
11.4. UTF-8 and its solutions . . . . . . . . . . . . . . . . 115 11.2. Overview of Universal Character Set Standards . . . . . 120
11.5. Normalization . . . . . . . . . . . . . . . . . . . . . 116 11.3. Difficulties with UCS-4, UCS-2, Unicode . . . . . . . . 121
11.6. UTF-8 Related Errors . . . . . . . . . . . . . . . . . 116 11.4. UTF-8 and its solutions . . . . . . . . . . . . . . . . 121
12. Error Definitions . . . . . . . . . . . . . . . . . . . . 118 11.5. Normalization . . . . . . . . . . . . . . . . . . . . . 122
13. NFS version 4 Requests . . . . . . . . . . . . . . . . . 124 11.6. UTF-8 Related Errors . . . . . . . . . . . . . . . . . 122
13.1. Compound Procedure . . . . . . . . . . . . . . . . . . 124 12. Error Definitions . . . . . . . . . . . . . . . . . . . . 124
13.2. Evaluation of a Compound Request . . . . . . . . . . . 125 13. NFS version 4 Requests . . . . . . . . . . . . . . . . . 130
13.3. Synchronous Modifying Operations . . . . . . . . . . . 125 13.1. Compound Procedure . . . . . . . . . . . . . . . . . . 130
13.4. Operation Values . . . . . . . . . . . . . . . . . . . 126 13.2. Evaluation of a Compound Request . . . . . . . . . . . 131
14. NFS version 4 Procedures . . . . . . . . . . . . . . . . 127 13.3. Synchronous Modifying Operations . . . . . . . . . . . 131
14.1. Procedure 0: NULL - No Operation . . . . . . . . . . . 127 13.4. Operation Values . . . . . . . . . . . . . . . . . . . 132
14.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 128 14. NFS version 4 Procedures . . . . . . . . . . . . . . . . 133
14.2.1. Operation 3: ACCESS - Check Access Rights . . . . . . 131 14.1. Procedure 0: NULL - No Operation . . . . . . . . . . . 133
14.2.2. Operation 4: CLOSE - Close File . . . . . . . . . . . 134 14.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 134
14.2.3. Operation 5: COMMIT - Commit Cached Data . . . . . . 136 14.2.1. Operation 3: ACCESS - Check Access Rights . . . . . . 137
14.2.4. Operation 6: CREATE - Create a Non-Regular File Object 139 14.2.2. Operation 4: CLOSE - Close File . . . . . . . . . . . 140
14.2.3. Operation 5: COMMIT - Commit Cached Data . . . . . . 142
14.2.4. Operation 6: CREATE - Create a Non-Regular File Object 145
14.2.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting 14.2.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting
Recovery . . . . . . . . . . . . . . . . . . . . . . 142 Recovery . . . . . . . . . . . . . . . . . . . . . . 148
14.2.6. Operation 8: DELEGRETURN - Return Delegation . . . . 143 14.2.6. Operation 8: DELEGRETURN - Return Delegation . . . . 150
14.2.7. Operation 9: GETATTR - Get Attributes . . . . . . . . 144 14.2.7. Operation 9: GETATTR - Get Attributes . . . . . . . . 151
14.2.8. Operation 10: GETFH - Get Current Filehandle . . . . 146 14.2.8. Operation 10: GETFH - Get Current Filehandle . . . . 153
14.2.9. Operation 11: LINK - Create Link to a File . . . . . 148 14.2.9. Operation 11: LINK - Create Link to a File . . . . . 155
14.2.10. Operation 12: LOCK - Create Lock . . . . . . . . . . 150 14.2.10. Operation 12: LOCK - Create Lock . . . . . . . . . . 157
14.2.11. Operation 13: LOCKT - Test For Lock . . . . . . . . 154 14.2.11. Operation 13: LOCKT - Test For Lock . . . . . . . . 161
14.2.12. Operation 14: LOCKU - Unlock File . . . . . . . . . 156
14.2.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . 158
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
14.2.14. Operation 16: LOOKUPP - Lookup Parent Directory . . 161 14.2.12. Operation 14: LOCKU - Unlock File . . . . . . . . . 163
14.2.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . 165
14.2.14. Operation 16: LOOKUPP - Lookup Parent Directory . . 168
14.2.15. Operation 17: NVERIFY - Verify Difference in 14.2.15. Operation 17: NVERIFY - Verify Difference in
Attributes . . . . . . . . . . . . . . . . . . . . . 162 Attributes . . . . . . . . . . . . . . . . . . . . . 169
14.2.16. Operation 18: OPEN - Open a Regular File . . . . . . 164 14.2.16. Operation 18: OPEN - Open a Regular File . . . . . . 171
14.2.17. Operation 19: OPENATTR - Open Named Attribute 14.2.17. Operation 19: OPENATTR - Open Named Attribute
Directory . . . . . . . . . . . . . . . . . . . . . 174 Directory . . . . . . . . . . . . . . . . . . . . . 181
14.2.18. Operation 20: OPEN_CONFIRM - Confirm Open . . . . . 176 14.2.18. Operation 20: OPEN_CONFIRM - Confirm Open . . . . . 183
14.2.19. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access179 14.2.19. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access186
14.2.20. Operation 22: PUTFH - Set Current Filehandle . . . . 181 14.2.20. Operation 22: PUTFH - Set Current Filehandle . . . . 188
14.2.21. Operation 23: PUTPUBFH - Set Public Filehandle . . . 182 14.2.21. Operation 23: PUTPUBFH - Set Public Filehandle . . . 189
14.2.22. Operation 24: PUTROOTFH - Set Root Filehandle . . . 184 14.2.22. Operation 24: PUTROOTFH - Set Root Filehandle . . . 191
14.2.23. Operation 25: READ - Read from File . . . . . . . . 185 14.2.23. Operation 25: READ - Read from File . . . . . . . . 192
14.2.24. Operation 26: READDIR - Read Directory . . . . . . . 188 14.2.24. Operation 26: READDIR - Read Directory . . . . . . . 195
14.2.25. Operation 27: READLINK - Read Symbolic Link . . . . 192 14.2.25. Operation 27: READLINK - Read Symbolic Link . . . . 199
14.2.26. Operation 28: REMOVE - Remove Filesystem Object . . 194 14.2.26. Operation 28: REMOVE - Remove Filesystem Object . . 201
14.2.27. Operation 29: RENAME - Rename Directory Entry . . . 197 14.2.27. Operation 29: RENAME - Rename Directory Entry . . . 204
14.2.28. Operation 30: RENEW - Renew a Lease . . . . . . . . 200 14.2.28. Operation 30: RENEW - Renew a Lease . . . . . . . . 207
14.2.29. Operation 31: RESTOREFH - Restore Saved Filehandle . 201 14.2.29. Operation 31: RESTOREFH - Restore Saved Filehandle . 209
14.2.30. Operation 32: SAVEFH - Save Current Filehandle . . . 203 14.2.30. Operation 32: SAVEFH - Save Current Filehandle . . . 211
14.2.31. Operation 33: SECINFO - Obtain Available Security . 204 14.2.31. Operation 33: SECINFO - Obtain Available Security . 212
14.2.32. Operation 34: SETATTR - Set Attributes . . . . . . . 208 14.2.32. Operation 34: SETATTR - Set Attributes . . . . . . . 216
14.2.33. Operation 35: SETCLIENTID - Negotiate Clientid . . . 211 14.2.33. Operation 35: SETCLIENTID - Negotiate Clientid . . . 219
14.2.34. Operation 36: SETCLIENTID_CONFIRM - Confirm Clientid 215 14.2.34. Operation 36: SETCLIENTID_CONFIRM - Confirm Clientid 223
14.2.35. Operation 37: VERIFY - Verify Same Attributes . . . 219 14.2.35. Operation 37: VERIFY - Verify Same Attributes . . . 227
14.2.36. Operation 38: WRITE - Write to File . . . . . . . . 221 14.2.36. Operation 38: WRITE - Write to File . . . . . . . . 229
14.2.37. Operation 39: RELEASE_LOCKOWNER - Release Lockowner 14.2.37. Operation 39: RELEASE_LOCKOWNER - Release Lockowner
State . . . . . . . . . . . . . . . . . . . . . . . 226 State . . . . . . . . . . . . . . . . . . . . . . . 234
14.2.38. Operation 10044: ILLEGAL - Illegal operation . . . . 228 14.2.38. Operation 10044: ILLEGAL - Illegal operation . . . . 236
15. NFS version 4 Callback Procedures . . . . . . . . . . . . 229 15. NFS version 4 Callback Procedures . . . . . . . . . . . . 237
15.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 229 15.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 237
15.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . 230 15.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . 238
15.2.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . 232 15.2.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . 240
15.2.2. Operation 4: CB_RECALL - Recall an Open Delegation . 234 15.2.2. Operation 4: CB_RECALL - Recall an Open Delegation . 242
15.2.3. Operation 10044: CB_ILLEGAL - Illegal Callback 15.2.3. Operation 10044: CB_ILLEGAL - Illegal Callback
Operation . . . . . . . . . . . . . . . . . . . . . . 236 Operation . . . . . . . . . . . . . . . . . . . . . . 244
16. Security Considerations . . . . . . . . . . . . . . . . . 237 16. Security Considerations . . . . . . . . . . . . . . . . . 245
17. IANA Considerations . . . . . . . . . . . . . . . . . . . 238 17. IANA Considerations . . . . . . . . . . . . . . . . . . . 246
17.1. Named Attribute Definition . . . . . . . . . . . . . . 238 17.1. Named Attribute Definition . . . . . . . . . . . . . . 246
17.2. ONC RPC Network Identifiers (netids) . . . . . . . . . 238 17.2. ONC RPC Network Identifiers (netids) . . . . . . . . . 246
18. RPC definition file . . . . . . . . . . . . . . . . . . . 239 18. RPC definition file . . . . . . . . . . . . . . . . . . . 247
19. Bibliography . . . . . . . . . . . . . . . . . . . . . . 271 19. Bibliography . . . . . . . . . . . . . . . . . . . . . . 279
20. Authors . . . . . . . . . . . . . . . . . . . . . . . . . 277 20. Authors . . . . . . . . . . . . . . . . . . . . . . . . . 285
20.1. Editor's Address . . . . . . . . . . . . . . . . . . . 277 20.1. Editor's Address . . . . . . . . . . . . . . . . . . . 285
20.2. Authors' Addresses . . . . . . . . . . . . . . . . . . 277 20.2. Authors' Addresses . . . . . . . . . . . . . . . . . . 285
20.3. Acknowledgements . . . . . . . . . . . . . . . . . . . 278 20.3. Acknowledgements . . . . . . . . . . . . . . . . . . . 286
21. Full Copyright Statement . . . . . . . . . . . . . . . . 279 21. Full Copyright Statement . . . . . . . . . . . . . . . . 287
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
1. Introduction 1. Introduction
The NFS version 4 protocol is a further revision of the NFS protocol The NFS version 4 protocol is a further revision of the NFS protocol
defined already by versions 2 [RFC1094] and 3 [RFC1813]. It retains defined already by versions 2 [RFC1094] and 3 [RFC1813]. It retains
the essential characteristics of previous versions: design for easy the essential characteristics of previous versions: design for easy
recovery, independent of transport protocols, operating systems and recovery, independent of transport protocols, operating systems and
filesystems, simplicity, and good performance. The NFS version 4 filesystems, simplicity, and good performance. The NFS version 4
revision has the following goals: revision has the following goals:
skipping to change at page 8, line 5 skipping to change at page 8, line 5
1.1. Inconsistencies of this Document with Section 18 1.1. Inconsistencies of this Document with Section 18
Section 18, RPC Definition File, contains the definitions in XDR Section 18, RPC Definition File, contains the definitions in XDR
description language of the constructs used by the protocol. Prior description language of the constructs used by the protocol. Prior
to Section 18, several of the constructs are reproduced for purposes to Section 18, several of the constructs are reproduced for purposes
of explanation. The reader is warned of the possibility of errors in of explanation. The reader is warned of the possibility of errors in
the reproduced constructs outside of Section 18. For any part of the the reproduced constructs outside of Section 18. For any part of the
document that is inconsistent with Section 18, Section 18 is to be document that is inconsistent with Section 18, Section 18 is to be
considered authoritative. considered authoritative.
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
1.2. Overview of NFS version 4 Features 1.2. Overview of NFS version 4 Features
To provide a reasonable context for the reader, the major features of To provide a reasonable context for the reader, the major features of
NFS version 4 protocol will be reviewed in brief. This will be done NFS version 4 protocol will be reviewed in brief. This will be done
to provide an appropriate context for both the reader who is familiar to provide an appropriate context for both the reader who is familiar
with the previous versions of the NFS protocol and the reader that is with the previous versions of the NFS protocol and the reader that is
new to the NFS protocols. For the reader new to the NFS protocols, new to the NFS protocols. For the reader new to the NFS protocols,
there is still a fundamental knowledge that is expected. The reader there is still a fundamental knowledge that is expected. The reader
should be familiar with the XDR and RPC protocols as described in should be familiar with the XDR and RPC protocols as described in
skipping to change at page 9, line 5 skipping to change at page 9, line 5
The COMPOUND procedure is defined in terms of operations and these The COMPOUND procedure is defined in terms of operations and these
operations correspond more closely to the traditional NFS procedures. operations correspond more closely to the traditional NFS procedures.
With the use of the COMPOUND procedure, the client is able to build With the use of the COMPOUND procedure, the client is able to build
simple or complex requests. These COMPOUND requests allow for a simple or complex requests. These COMPOUND requests allow for a
reduction in the number of RPCs needed for logical filesystem reduction in the number of RPCs needed for logical filesystem
operations. For example, without previous contact with a server a operations. For example, without previous contact with a server a
client will be able to read data from a file in one request by client will be able to read data from a file in one request by
combining LOOKUP, OPEN, and READ operations in a single COMPOUND RPC. combining LOOKUP, OPEN, and READ operations in a single COMPOUND RPC.
With previous versions of the NFS protocol, this type of single With previous versions of the NFS protocol, this type of single
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
request was not possible. request was not possible.
The model used for COMPOUND is very simple. There is no logical OR The model used for COMPOUND is very simple. There is no logical OR
or ANDing of operations. The operations combined within a COMPOUND or ANDing of operations. The operations combined within a COMPOUND
request are evaluated in order by the server. Once an operation request are evaluated in order by the server. Once an operation
returns a failing result, the evaluation ends and the results of all returns a failing result, the evaluation ends and the results of all
evaluated operations are returned to the client. evaluated operations are returned to the client.
The NFS version 4 protocol continues to have the client refer to a The NFS version 4 protocol continues to have the client refer to a
skipping to change at page 9, line 38 skipping to change at page 9, line 38
the same as previous versions. The server filesystem is hierarchical the same as previous versions. The server filesystem is hierarchical
with the regular files contained within being treated as opaque byte with the regular files contained within being treated as opaque byte
streams. In a slight departure, file and directory names are encoded streams. In a slight departure, file and directory names are encoded
with UTF-8 to deal with the basics of internationalization. with UTF-8 to deal with the basics of internationalization.
The NFS version 4 protocol does not require a separate protocol to The NFS version 4 protocol does not require a separate protocol to
provide for the initial mapping between path name and filehandle. provide for the initial mapping between path name and filehandle.
Instead of using the older MOUNT protocol for this mapping, the Instead of using the older MOUNT protocol for this mapping, the
server provides a ROOT filehandle that represents the logical root or server provides a ROOT filehandle that represents the logical root or
top of the filesystem tree provided by the server. The server top of the filesystem tree provided by the server. The server
provides multiple filesystems by glueing them together with pseudo provides multiple filesystems by gluing them together with pseudo
filesystems. These pseudo filesystems provide for potential gaps in filesystems. These pseudo filesystems provide for potential gaps in
the path names between real filesystems. the path names between real filesystems.
1.2.3.1. Filehandle Types 1.2.3.1. Filehandle Types
In previous versions of the NFS protocol, the filehandle provided by In previous versions of the NFS protocol, the filehandle provided by
the server was guaranteed to be valid or persistent for the lifetime the server was guaranteed to be valid or persistent for the lifetime
of the filesystem object to which it referred. For some server of the filesystem object to which it referred. For some server
implementations, this persistence requirement has been difficult to implementations, this persistence requirement has been difficult to
meet. For the NFS version 4 protocol, this requirement has been meet. For the NFS version 4 protocol, this requirement has been
relaxed by introducing another type of filehandle, volatile. With relaxed by introducing another type of filehandle, volatile. With
persistent and volatile filehandle types, the server implementation persistent and volatile filehandle types, the server implementation
can match the abilities of the filesystem at the server along with can match the abilities of the filesystem at the server along with
the operating environment. The client will have knowledge of the the operating environment. The client will have knowledge of the
type of filehandle being provided by the server and can be prepared type of filehandle being provided by the server and can be prepared
to deal with the semantics of each. to deal with the semantics of each.
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
1.2.3.2. Attribute Types 1.2.3.2. Attribute Types
The NFS version 4 protocol introduces three classes of filesystem or The NFS version 4 protocol introduces three classes of filesystem or
file attributes. Like the additional filehandle type, the file attributes. Like the additional filehandle type, the
classification of file attributes has been done to ease server classification of file attributes has been done to ease server
implementations along with extending the overall functionality of the implementations along with extending the overall functionality of the
NFS protocol. This attribute model is structured to be extensible NFS protocol. This attribute model is structured to be extensible
such that new attributes can be introduced in minor revisions of the such that new attributes can be introduced in minor revisions of the
protocol without requiring significant rework. protocol without requiring significant rework.
skipping to change at page 11, line 5 skipping to change at page 11, line 5
replicate server filesystems is enabled within the protocol. The replicate server filesystems is enabled within the protocol. The
filesystem locations attribute provides a method for the client to filesystem locations attribute provides a method for the client to
probe the server about the location of a filesystem. In the event of probe the server about the location of a filesystem. In the event of
a migration of a filesystem, the client will receive an error when a migration of a filesystem, the client will receive an error when
operating on the filesystem and it can then query as to the new file operating on the filesystem and it can then query as to the new file
system location. Similar steps are used for replication, the client system location. Similar steps are used for replication, the client
is able to query the server for the multiple available locations of a is able to query the server for the multiple available locations of a
particular filesystem. From this information, the client can use its particular filesystem. From this information, the client can use its
own policies to access the appropriate filesystem location. own policies to access the appropriate filesystem location.
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
1.2.4. OPEN and CLOSE 1.2.4. OPEN and CLOSE
The NFS version 4 protocol introduces OPEN and CLOSE operations. The The NFS version 4 protocol introduces OPEN and CLOSE operations. The
OPEN operation provides a single point where file lookup, creation, OPEN operation provides a single point where file lookup, creation,
and share semantics can be combined. The CLOSE operation also and share semantics can be combined. The CLOSE operation also
provides for the release of state accumulated by OPEN. provides for the release of state accumulated by OPEN.
1.2.5. File locking 1.2.5. File locking
skipping to change at page 12, line 5 skipping to change at page 12, line 5
client. When the server grants a delegation for a file to a client, client. When the server grants a delegation for a file to a client,
the client is guaranteed certain semantics with respect to the the client is guaranteed certain semantics with respect to the
sharing of that file with other clients. At OPEN, the server may sharing of that file with other clients. At OPEN, the server may
provide the client either a read or write delegation for the file. provide the client either a read or write delegation for the file.
If the client is granted a read delegation, it is assured that no If the client is granted a read delegation, it is assured that no
other client has the ability to write to the file for the duration of other client has the ability to write to the file for the duration of
the delegation. If the client is granted a write delegation, the the delegation. If the client is granted a write delegation, the
client is assured that no other client has read or write access to client is assured that no other client has read or write access to
the file. the file.
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
Delegations can be recalled by the server. If another client Delegations can be recalled by the server. If another client
requests access to the file in such a way that the access conflicts requests access to the file in such a way that the access conflicts
with the granted delegation, the server is able to notify the initial with the granted delegation, the server is able to notify the initial
client and recall the delegation. This requires that a callback path client and recall the delegation. This requires that a callback path
exist between the server and client. If this callback path does not exist between the server and client. If this callback path does not
exist, then delegations can not be granted. The essence of a exist, then delegations can not be granted. The essence of a
delegation is that it allows the client to locally service operations delegation is that it allows the client to locally service operations
such as OPEN, CLOSE, LOCK, LOCKU, READ, WRITE without immediate such as OPEN, CLOSE, LOCK, LOCKU, READ, WRITE without immediate
interaction with the server. interaction with the server.
skipping to change at page 13, line 5 skipping to change at page 13, line 5
alleviate the expense a server would have in maintaining alleviate the expense a server would have in maintaining
state about variable length leases across server failures. state about variable length leases across server failures.
Lock The term "lock" is used to refer to both record (byte- Lock The term "lock" is used to refer to both record (byte-
range) locks as well as share reservations unless range) locks as well as share reservations unless
specifically stated otherwise. specifically stated otherwise.
Server The "Server" is the entity responsible for coordinating Server The "Server" is the entity responsible for coordinating
client access to a set of filesystems. client access to a set of filesystems.
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
Stable Storage Stable Storage
NFS version 4 servers must be able to recover without data NFS version 4 servers must be able to recover without data
loss from multiple power failures (including cascading loss from multiple power failures (including cascading
power failures, that is, several power failures in quick power failures, that is, several power failures in quick
succession), operating system failures, and hardware succession), operating system failures, and hardware
failure of components other than the storage medium itself failure of components other than the storage medium itself
(for example, disk, nonvolatile RAM). (for example, disk, nonvolatile RAM).
Some examples of stable storage that are allowable for an Some examples of stable storage that are allowable for an
skipping to change at page 14, line 5 skipping to change at page 14, line 5
defines the open and locking state provided by the server defines the open and locking state provided by the server
for a specific open or lock owner for a specific file. for a specific open or lock owner for a specific file.
Stateids composed of all bits 0 or all bits 1 have special Stateids composed of all bits 0 or all bits 1 have special
meaning and are reserved values. meaning and are reserved values.
Verifier A 64-bit quantity generated by the client that the server Verifier A 64-bit quantity generated by the client that the server
can use to determine if the client has restarted and lost can use to determine if the client has restarted and lost
all previous lock state. all previous lock state.
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
2. Protocol Data Types 2. Protocol Data Types
The syntax and semantics to describe the data types of the NFS The syntax and semantics to describe the data types of the NFS
version 4 protocol are defined in the XDR [RFC1832] and RPC [RFC1831] version 4 protocol are defined in the XDR [RFC1832] and RPC [RFC1831]
documents. The next sections build upon the XDR data types to define documents. The next sections build upon the XDR data types to define
types and structures specific to this protocol. types and structures specific to this protocol.
2.1. Basic Data Types 2.1. Basic Data Types
skipping to change at page 15, line 5 skipping to change at page 15, line 5
mode4 typedef uint32_t mode4; mode4 typedef uint32_t mode4;
Mode attribute data type Mode attribute data type
nfs_cookie4 typedef uint64_t nfs_cookie4; nfs_cookie4 typedef uint64_t nfs_cookie4;
Opaque cookie value for READDIR Opaque cookie value for READDIR
nfs_fh4 typedef opaque nfs_fh4<NFS4_FHSIZE>; nfs_fh4 typedef opaque nfs_fh4<NFS4_FHSIZE>;
Filehandle definition; NFS4_FHSIZE is defined as 128 Filehandle definition; NFS4_FHSIZE is defined as 128
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
nfs_ftype4 enum nfs_ftype4; nfs_ftype4 enum nfs_ftype4;
Various defined file types Various defined file types
nfsstat4 enum nfsstat4; nfsstat4 enum nfsstat4;
Return value for operations Return value for operations
offset4 typedef uint64_t offset4; offset4 typedef uint64_t offset4;
Various offset designations (READ, WRITE, LOCK, COMMIT) Various offset designations (READ, WRITE, LOCK, COMMIT)
skipping to change at page 15, line 38 skipping to change at page 15, line 38
seqid4 typedef uint32_t seqid4; seqid4 typedef uint32_t seqid4;
Sequence identifier used for file locking Sequence identifier used for file locking
utf8string typedef opaque utf8string<>; utf8string typedef opaque utf8string<>;
UTF-8 encoding for strings UTF-8 encoding for strings
verifier4 typedef opaque verifier4[NFS4_VERIFIER_SIZE]; verifier4 typedef opaque verifier4[NFS4_VERIFIER_SIZE];
Verifier used for various operations (COMMIT, CREATE, Verifier used for various operations (COMMIT, CREATE,
OPEN, READDIR, SETCLIENTID, SETCLIENTID_CONFIRM, WRITE) OPEN, READDIR, SETCLIENTID, SETCLIENTID_CONFIRM, WRITE)
NFS4_VERIFIER_SIZE is defined as 8 NFS4_VERIFIER_SIZE is defined as 8.
2.2. Structured Data Types 2.2. Structured Data Types
nfstime4 nfstime4
struct nfstime4 { struct nfstime4 {
int64_t seconds; int64_t seconds;
uint32_t nseconds; uint32_t nseconds;
} }
The nfstime4 structure gives the number of seconds and The nfstime4 structure gives the number of seconds and
nanoseconds since midnight or 0 hour January 1, 1970 Coordinated nanoseconds since midnight or 0 hour January 1, 1970 Coordinated
Universal Time (UTC). Values greater than zero for the seconds Universal Time (UTC). Values greater than zero for the seconds
field denote dates after the 0 hour January 1, 1970. Values field denote dates after the 0 hour January 1, 1970. Values
less than zero for the seconds field denote dates before the 0 less than zero for the seconds field denote dates before the 0
hour January 1, 1970. In both cases, the nseconds field is to hour January 1, 1970. In both cases, the nseconds field is to
be added to the seconds field for the final time representation. be added to the seconds field for the final time representation.
For example, if the time to be represented is one-half second For example, if the time to be represented is one-half second
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
before 0 hour January 1, 1970, the seconds field would have a before 0 hour January 1, 1970, the seconds field would have a
value of negative one (-1) and the nseconds fields would have a value of negative one (-1) and the nseconds fields would have a
value of one-half second (500000000). Values greater than value of one-half second (500000000). Values greater than
999,999,999 for nseconds are considered invalid. 999,999,999 for nseconds are considered invalid.
This data type is used to pass time and date information. A This data type is used to pass time and date information. A
server converts to and from its local representation of time server converts to and from its local representation of time
when processing time values, preserving as much accuracy as when processing time values, preserving as much accuracy as
possible. If the precision of timestamps stored for a filesystem possible. If the precision of timestamps stored for a filesystem
skipping to change at page 17, line 5 skipping to change at page 17, line 5
This data type represents additional information for the device This data type represents additional information for the device
file types NF4CHR and NF4BLK. file types NF4CHR and NF4BLK.
fsid4 fsid4
struct fsid4 { struct fsid4 {
uint64_t major; uint64_t major;
uint64_t minor; uint64_t minor;
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
}; };
This type is the filesystem identifier that is used as a This type is the filesystem identifier that is used as a
mandatory attribute. mandatory attribute.
fs_location4 fs_location4
struct fs_location4 { struct fs_location4 {
utf8string server<>; utf8string server<>;
skipping to change at page 18, line 5 skipping to change at page 18, line 5
+-----------+-----------+-----------+-- +-----------+-----------+-----------+--
| count | 31 .. 0 | 63 .. 32 | | count | 31 .. 0 | 63 .. 32 |
+-----------+-----------+-----------+-- +-----------+-----------+-----------+--
change_info4 change_info4
struct change_info4 { struct change_info4 {
bool atomic; bool atomic;
changeid4 before; changeid4 before;
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
changeid4 after; changeid4 after;
}; };
This structure is used with the CREATE, LINK, REMOVE, RENAME This structure is used with the CREATE, LINK, REMOVE, RENAME
operations to let the client know the value of the change operations to let the client know the value of the change
attribute for the directory in which the target filesystem attribute for the directory in which the target filesystem
object resides. object resides.
clientaddr4 clientaddr4
skipping to change at page 18, line 55 skipping to change at page 18, line 55
For TCP over IPv4 the value of r_netid is the string "tcp". For For TCP over IPv4 the value of r_netid is the string "tcp". For
UDP over IPv4 the value of r_netid is the string "udp". UDP over IPv4 the value of r_netid is the string "udp".
For TCP over IPv4 and for UDP over IPv6, the format of r_addr is For TCP over IPv4 and for UDP over IPv6, the format of r_addr is
the US-ASCII string: the US-ASCII string:
x1:x2:x3:x4:x5:x6:x7:x8.p1.p2 x1:x2:x3:x4:x5:x6:x7:x8.p1.p2
The suffix "p1.p2" is the service port, and is computed the same The suffix "p1.p2" is the service port, and is computed the same
way as with univeral addresses for TCP and UDP over IPv4. The way as with universal addresses for TCP and UDP over IPv4. The
prefix, "x1:x2:x3:x4:x5:x6:x7:x8", is the standard textual form prefix, "x1:x2:x3:x4:x5:x6:x7:x8", is the standard textual form
for representing an IPv6 address as defined in Section 2.2 of for representing an IPv6 address as defined in Section 2.2 of
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
[RFC1884]. Additionally, the two alternative forms specified in [RFC1884]. Additionally, the two alternative forms specified in
Section 2.2 of [RFC1884] are also acceptable. Section 2.2 of [RFC1884] are also acceptable.
For TCP over IPv6 the value of r_netid is the string "tcp6". For TCP over IPv6 the value of r_netid is the string "tcp6".
For UDP over IPv6 the value of r_netid is the string "udp6". For UDP over IPv6 the value of r_netid is the string "udp6".
cb_client4 cb_client4
struct cb_client4 { struct cb_client4 {
skipping to change at page 20, line 5 skipping to change at page 20, line 5
lock_owner4 lock_owner4
struct lock_owner4 { struct lock_owner4 {
clientid4 clientid; clientid4 clientid;
opaque owner<NFS4_OPAQUE_LIMIT>; opaque owner<NFS4_OPAQUE_LIMIT>;
}; };
This structure is used to identify the owner of file locking This structure is used to identify the owner of file locking
state. NFS4_OPAQUE_LIMIT is defined as 1024. state. NFS4_OPAQUE_LIMIT is defined as 1024.
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
open_to_lock_owner4 open_to_lock_owner4
struct open_to_lock_owner4 { struct open_to_lock_owner4 {
seqid4 open_seqid; seqid4 open_seqid;
stateid4 open_stateid; stateid4 open_stateid;
seqid4 lock_seqid; seqid4 lock_seqid;
lock_owner4 lock_owner; lock_owner4 lock_owner;
}; };
skipping to change at page 21, line 5 skipping to change at page 21, line 5
This structure is used for the various state sharing mechanisms This structure is used for the various state sharing mechanisms
between the client and server. For the client, this data between the client and server. For the client, this data
structure is read-only. The starting value of the seqid field structure is read-only. The starting value of the seqid field
is undefined. The server is required to increment the seqid is undefined. The server is required to increment the seqid
field monotonically at each transition of the stateid. This is field monotonically at each transition of the stateid. This is
important since the client will inspect the seqid in OPEN important since the client will inspect the seqid in OPEN
stateids to determine the order of OPEN processing done by the stateids to determine the order of OPEN processing done by the
server. server.
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
3. RPC and Security Flavor 3. RPC and Security Flavor
The NFS version 4 protocol is a Remote Procedure Call (RPC) The NFS version 4 protocol is a Remote Procedure Call (RPC)
application that uses RPC version 2 and the corresponding eXternal application that uses RPC version 2 and the corresponding eXternal
Data Representation (XDR) as defined in [RFC1831] and [RFC1832]. The Data Representation (XDR) as defined in [RFC1831] and [RFC1832]. The
RPCSEC_GSS security flavor as defined in [RFC2203] MUST be used as RPCSEC_GSS security flavor as defined in [RFC2203] MUST be used as
the mechanism to deliver stronger security for the NFS version 4 the mechanism to deliver stronger security for the NFS version 4
protocol. protocol.
3.1. Ports and Transports 3.1. Ports and Transports
Historically, NFS version 2 and version 3 servers have resided on Historically, NFS version 2 and version 3 servers have resided on
port 2049. The registered port 2049 [RFC1700] for the NFS protocol port 2049. The registered port 2049 [RFC1700] for the NFS protocol
should be the default configuration. Using the registered port for should be the default configuration. Using the registered port for
NFS services means the NFS client will not need to use the RPC NFS services means the NFS client will not need to use the RPC
binding protocols as described in [RFC1833]; this will allow NFS to binding protocols as described in [RFC1833]; this will allow NFS to
transit firewalls. transit firewalls.
The transport used by the RPC service for the NFS version 4 protocol Where an NFS version 4 implementation supports operation over the IP
MUST provide congestion control comparable to that defined for TCP in network protocol, the supported transports between NFS and IP must be
[RFC2581]. If the operating environment implements TCP, the NFS among the IETF-approved congestion control transport protocols, which
version 4 protocol SHOULD be supported over TCP. The NFS client and include TCP and SCTP. To enhance the possibilities for
server MAY use other transports if they support congestion control as interoperability, an NFS version 4 implementation SHOULD support
defined above and in those cases a mechanism may be provided to operation over the TCP transport protocol.
override TCP usage in favor of another transport.
If TCP is used as the transport, the client and server SHOULD use If TCP is used as the transport, the client and server SHOULD use
persistent connections. This will prevent the weakening of TCP's persistent connections. This will prevent the weakening of TCP's
congestion control via short lived connections and will improve congestion control via short lived connections and will improve
performance for the WAN environment by eliminating the need for SYN performance for the WAN environment by eliminating the need for SYN
handshakes. handshakes.
Note that for various timers, the client and server should avoid Note that for various timers, the client and server should avoid
inadvertent synchronization of those timers. For further discussion inadvertent synchronization of those timers. For further discussion
of the general issue refer to [Floyd]. of the general issue refer to [Floyd].
skipping to change at page 21, line 55 skipping to change at page 21, line 54
When processing a request received over a reliable transport such as When processing a request received over a reliable transport such as
TCP, the NFS version 4 server MUST NOT silently drop the request, TCP, the NFS version 4 server MUST NOT silently drop the request,
except if the transport connection has been broken. Given such a except if the transport connection has been broken. Given such a
contract between NFS version 4 clients and servers, clients MUST NOT contract between NFS version 4 clients and servers, clients MUST NOT
retry a request unless one or both of the following are true: retry a request unless one or both of the following are true:
o The transport connection has been broken o The transport connection has been broken
o The procedure being retried is the NULL procedure o The procedure being retried is the NULL procedure
Since transports, including TCP, do not always synchronously inform a Since reliable transports, such as TCP, do not always synchronously
peer when the other peer has broken the connection (for example, when inform a peer when the other peer has broken the connection (for
example, when an NFS server reboots), so the NFS version 4 client may
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
an NFS server reboots), so the NFS version 4 client may want to want to actively "probe" the connection to see if has been broken.
actively "probe" the connection to see if has been broken. Use of Use of the NULL procedure is one recommended way to do so. So, when
the NULL procedure is one recommended way to do so. So, when a a client experiences a remote procedure call timeout (of some
client experiences a remote procedure call timeout (of some arbitrary arbitrary implementation specific amount), rather than retrying the
implementation specific amount), rather than retrying the remote remote procedure call, it could instead issue a NULL procedure call
procedure call, it could instead issue a NULL procedure call to the to the server. If the server has died, the transport connection break
server. If the server has died, the transport connection break will will eventually be indicated to the NFS version 4 client. The client
eventually be indicated to the NFS version 4 client. The client can can then reconnect, and then retry the original request. If the NULL
then reconnect, and then retry the original request. If the NULL
procedure call gets a response, the connection has not broken. The procedure call gets a response, the connection has not broken. The
client can decide to wait longer for the original request's response, client can decide to wait longer for the original request's response,
or it can break the transport connection and reconnect before re- or it can break the transport connection and reconnect before re-
sending the original request. sending the original request.
For callbacks from the server to the client, the same rules apply, For callbacks from the server to the client, the same rules apply,
but the server doing the callback becomes the client, and the client but the server doing the callback becomes the client, and the client
receiving the callback becomes the server. receiving the callback becomes the server.
3.2. Security Flavors 3.2. Security Flavors
skipping to change at page 23, line 4 skipping to change at page 22, line 57
column descriptions: column descriptions:
1 == number of pseudo flavor 1 == number of pseudo flavor
2 == name of pseudo flavor 2 == name of pseudo flavor
3 == mechanism's OID 3 == mechanism's OID
4 == mechanism's algorithm(s) 4 == mechanism's algorithm(s)
5 == RPCSEC_GSS service 5 == RPCSEC_GSS service
1 2 3 4 5 1 2 3 4 5
-----------------------------------------------------------------------
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
-----------------------------------------------------------------------
390003 krb5 1.2.840.113554.1.2.2 DES MAC MD5 rpc_gss_svc_none 390003 krb5 1.2.840.113554.1.2.2 DES MAC MD5 rpc_gss_svc_none
390004 krb5i 1.2.840.113554.1.2.2 DES MAC MD5 rpc_gss_svc_integrity 390004 krb5i 1.2.840.113554.1.2.2 DES MAC MD5 rpc_gss_svc_integrity
390005 krb5p 1.2.840.113554.1.2.2 DES MAC MD5 rpc_gss_svc_privacy 390005 krb5p 1.2.840.113554.1.2.2 DES MAC MD5 rpc_gss_svc_privacy
for integrity, for integrity,
and 56 bit DES and 56 bit DES
for privacy. for privacy.
Note that the pseudo flavor is presented here as a mapping aid to the Note that the pseudo flavor is presented here as a mapping aid to the
implementor. Because this NFS protocol includes a method to implementor. Because this NFS protocol includes a method to
negotiate security and it understands the GSS-API mechanism, the negotiate security and it understands the GSS-API mechanism, the
skipping to change at page 24, line 4 skipping to change at page 23, line 57
Because SPKM-3 negotiates the algorithms, subsequent calls to Because SPKM-3 negotiates the algorithms, subsequent calls to
LIPKEY's GSS_Wrap() and GSS_GetMIC() by RPCSEC_GSS will use a quality LIPKEY's GSS_Wrap() and GSS_GetMIC() by RPCSEC_GSS will use a quality
of protection value of 0 (zero). See section 5.2 of [RFC2025] for an of protection value of 0 (zero). See section 5.2 of [RFC2025] for an
explanation. explanation.
LIPKEY uses SPKM-3 to create a secure channel in which to pass a user LIPKEY uses SPKM-3 to create a secure channel in which to pass a user
name and password from the client to the server. Once the user name name and password from the client to the server. Once the user name
and password have been accepted by the server, calls to the LIPKEY and password have been accepted by the server, calls to the LIPKEY
context are redirected to the SPKM-3 context. See [RFC2847] for more context are redirected to the SPKM-3 context. See [RFC2847] for more
Draft Specification NFS version 4 Protocol August 2002
details. details.
Draft Specification NFS version 4 Protocol September 2002
3.2.1.3. SPKM-3 as a security triple 3.2.1.3. SPKM-3 as a security triple
The SPKM-3 GSS-API mechanism as described in [RFC2847] MUST be The SPKM-3 GSS-API mechanism as described in [RFC2847] MUST be
implemented and provide the following security triples. The implemented and provide the following security triples. The
definition of the columns matches the previous subsection "Kerberos definition of the columns matches the previous subsection "Kerberos
V5 as security triple". V5 as security triple".
1 2 3 4 5 1 2 3 4 5
----------------------------------------------------------------------- -----------------------------------------------------------------------
390009 spkm3 1.3.6.1.5.5.1.3 negotiated rpc_gss_svc_none 390009 spkm3 1.3.6.1.5.5.1.3 negotiated rpc_gss_svc_none
skipping to change at page 25, line 5 skipping to change at page 24, line 52
that are available for use by NFS clients. In turn the NFS server that are available for use by NFS clients. In turn the NFS server
may be configured such that each of these entry points may have may be configured such that each of these entry points may have
different or multiple security mechanisms in use. different or multiple security mechanisms in use.
The security negotiation between client and server must be done with The security negotiation between client and server must be done with
a secure channel to eliminate the possibility of a third party a secure channel to eliminate the possibility of a third party
intercepting the negotiation sequence and forcing the client and intercepting the negotiation sequence and forcing the client and
server to choose a lower level of security than required or desired. server to choose a lower level of security than required or desired.
See the section "Security Considerations" for further discussion. See the section "Security Considerations" for further discussion.
Draft Specification NFS version 4 Protocol August 2002
3.3.1. SECINFO 3.3.1. SECINFO
The new SECINFO operation will allow the client to determine, on a The new SECINFO operation will allow the client to determine, on a
per filehandle basis, what security triple is to be used for server per filehandle basis, what security triple is to be used for server
access. In general, the client will not have to use the SECINFO access. In general, the client will not have to use the SECINFO
Draft Specification NFS version 4 Protocol September 2002
operation except during initial communication with the server or when operation except during initial communication with the server or when
the client crosses policy boundaries at the server. It is possible the client crosses policy boundaries at the server. It is possible
that the server's policies change during the client's interaction that the server's policies change during the client's interaction
therefore forcing the client to negotiate a new security triple. therefore forcing the client to negotiate a new security triple.
3.3.2. Security Error 3.3.2. Security Error
Based on the assumption that each NFS version 4 client and server Based on the assumption that each NFS version 4 client and server
must support a minimum set of security (i.e. LIPKEY, SPKM-3, and must support a minimum set of security (i.e. LIPKEY, SPKM-3, and
Kerberos-V5 all under RPCSEC_GSS), the NFS client will start its Kerberos-V5 all under RPCSEC_GSS), the NFS client will start its
skipping to change at page 25, line 42 skipping to change at page 25, line 37
3.4. Callback RPC Authentication 3.4. Callback RPC Authentication
Except as noted elsewhere in this section, the callback RPC Except as noted elsewhere in this section, the callback RPC
(described later) MUST mutually authenticate the NFS server to the (described later) MUST mutually authenticate the NFS server to the
principal that acquired the clientid (also described later), using principal that acquired the clientid (also described later), using
the security flavor the original SETCLIENTID operation used. the security flavor the original SETCLIENTID operation used.
For AUTH_NONE, there are no principals, so this is a non-issue. For AUTH_NONE, there are no principals, so this is a non-issue.
AUTH_SYS has no notions of mutual authentation or a server principal, AUTH_SYS has no notions of mutual authentication or a server
so the callback from the server simply uses the AUTH_SYS credential principal, so the callback from the server simply uses the AUTH_SYS
that the user used when he set up the delegation. credential that the user used when he set up the delegation.
For AUTH_DH, one commonly used convention is that the server uses the For AUTH_DH, one commonly used convention is that the server uses the
credential corresponding to this AUTH_DH principal: credential corresponding to this AUTH_DH principal:
unix.host@domain unix.host@domain
where host and domain are variables corresponding to the name of where host and domain are variables corresponding to the name of
server host and directory services domain in which it lives such as a server host and directory services domain in which it lives such as a
Network Information System domain or a DNS domain. Network Information System domain or a DNS domain.
Because LIPKEY is layered over SPKM-3, it is permissible for the Because LIPKEY is layered over SPKM-3, it is permissible for the
server to use SPKM-3 and not LIPKEY for the callback even if the server to use SPKM-3 and not LIPKEY for the callback even if the
Draft Specification NFS version 4 Protocol August 2002
client used LIPKEY for SETCLIENTID. client used LIPKEY for SETCLIENTID.
Regardless of what security mechanism under RPCSEC_GSS is being used, Regardless of what security mechanism under RPCSEC_GSS is being used,
the NFS server, MUST identify itself in GSS-API via a the NFS server, MUST identify itself in GSS-API via a
GSS_C_NT_HOSTBASED_SERVICE name type. GSS_C_NT_HOSTBASED_SERVICE GSS_C_NT_HOSTBASED_SERVICE name type. GSS_C_NT_HOSTBASED_SERVICE
Draft Specification NFS version 4 Protocol September 2002
names are of the form: names are of the form:
service@hostname service@hostname
For NFS, the "service" element is For NFS, the "service" element is
nfs nfs
Implementations of security mechanisms will convert nfs@hostname to Implementations of security mechanisms will convert nfs@hostname to
various different forms. For Kerberos V5 and LIPKEY, the following various different forms. For Kerberos V5 and LIPKEY, the following
skipping to change at page 27, line 4 skipping to change at page 26, line 54
the SETCLIENTID operation. From an administrative perspective, the SETCLIENTID operation. From an administrative perspective,
having a user name, password, and certificate for both the having a user name, password, and certificate for both the
client and server is redundant. client and server is redundant.
o LIPKEY was intended to minimize additional infrastructure o LIPKEY was intended to minimize additional infrastructure
requirements beyond a certificate for the target, and the requirements beyond a certificate for the target, and the
expectation is that existing password infrastructure can be expectation is that existing password infrastructure can be
leveraged for the initiator. In some environments, a per-host leveraged for the initiator. In some environments, a per-host
password does not exist yet. If certificates are used for any password does not exist yet. If certificates are used for any
per-host principals, then additional password infrastructure is per-host principals, then additional password infrastructure is
Draft Specification NFS version 4 Protocol August 2002
not needed. not needed.
o In cases when a host is both an NFS client and server, it can o In cases when a host is both an NFS client and server, it can
share the same per-host certificate. share the same per-host certificate.
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
4. Filehandles 4. Filehandles
The filehandle in the NFS protocol is a per server unique identifier The filehandle in the NFS protocol is a per server unique identifier
for a filesystem object. The contents of the filehandle are opaque for a filesystem object. The contents of the filehandle are opaque
to the client. Therefore, the server is responsible for translating to the client. Therefore, the server is responsible for translating
the filehandle to an internal representation of the filesystem the filehandle to an internal representation of the filesystem
object. object.
4.1. Obtaining the First Filehandle 4.1. Obtaining the First Filehandle
skipping to change at page 29, line 5 skipping to change at page 28, line 5
used, the client can then traverse the entirety of the server's file used, the client can then traverse the entirety of the server's file
tree with the LOOKUP operation. A complete discussion of the server tree with the LOOKUP operation. A complete discussion of the server
name space is in the section "NFS Server Name Space". name space is in the section "NFS Server Name Space".
4.1.2. Public Filehandle 4.1.2. Public Filehandle
The second special filehandle is the PUBLIC filehandle. Unlike the The second special filehandle is the PUBLIC filehandle. Unlike the
ROOT filehandle, the PUBLIC filehandle may be bound or represent an ROOT filehandle, the PUBLIC filehandle may be bound or represent an
arbitrary filesystem object at the server. The server is responsible arbitrary filesystem object at the server. The server is responsible
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
for this binding. It may be that the PUBLIC filehandle and the ROOT for this binding. It may be that the PUBLIC filehandle and the ROOT
filehandle refer to the same filesystem object. However, it is up to filehandle refer to the same filesystem object. However, it is up to
the administrative software at the server and the policies of the the administrative software at the server and the policies of the
server administrator to define the binding of the PUBLIC filehandle server administrator to define the binding of the PUBLIC filehandle
and server filesystem object. The client may not make any and server filesystem object. The client may not make any
assumptions about this binding. The client uses the PUBLIC filehandle assumptions about this binding. The client uses the PUBLIC filehandle
via the PUTPUBFH operation. via the PUTPUBFH operation.
4.2. Filehandle Types 4.2. Filehandle Types
skipping to change at page 29, line 55 skipping to change at page 28, line 55
opaque. The client stores filehandles for use in a later request and opaque. The client stores filehandles for use in a later request and
can compare two filehandles from the same server for equality by can compare two filehandles from the same server for equality by
doing a byte-by-byte comparison. However, the client MUST NOT doing a byte-by-byte comparison. However, the client MUST NOT
otherwise interpret the contents of filehandles. If two filehandles otherwise interpret the contents of filehandles. If two filehandles
from the same server are equal, they MUST refer to the same file. from the same server are equal, they MUST refer to the same file.
Servers SHOULD try to maintain a one-to-one correspondence between Servers SHOULD try to maintain a one-to-one correspondence between
filehandles and files but this is not required. Clients MUST use filehandles and files but this is not required. Clients MUST use
filehandle comparisons only to improve performance, not for correct filehandle comparisons only to improve performance, not for correct
behavior. All clients need to be prepared for situations in which it behavior. All clients need to be prepared for situations in which it
cannot be determined whether two filehandles denote the same object cannot be determined whether two filehandles denote the same object
and in such cases, avoid making invalid assumpions which might cause and in such cases, avoid making invalid assumptions which might cause
incorrect behavior. Further discussion of filehandle and attribute incorrect behavior. Further discussion of filehandle and attribute
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
comparison in the context of data caching is presented in the section comparison in the context of data caching is presented in the section
"Data Caching and File Identity". "Data Caching and File Identity".
As an example, in the case that two different path names when As an example, in the case that two different path names when
traversed at the server terminate at the same filesystem object, the traversed at the server terminate at the same filesystem object, the
server SHOULD return the same filehandle for each path. This can server SHOULD return the same filehandle for each path. This can
occur if a hard link is used to create two file names which refer to occur if a hard link is used to create two file names which refer to
the same underlying file object and associated data. For example, if the same underlying file object and associated data. For example, if
paths /a/b/c and /a/d/c refer to the same file, the server SHOULD paths /a/b/c and /a/d/c refer to the same file, the server SHOULD
skipping to change at page 31, line 5 skipping to change at page 30, line 5
server should return NFS4ERR_STALE to the client (as is the case for server should return NFS4ERR_STALE to the client (as is the case for
persistent filehandles). In all other cases where the server persistent filehandles). In all other cases where the server
determines that a volatile filehandle can no longer be used, it determines that a volatile filehandle can no longer be used, it
should return an error of NFS4ERR_FHEXPIRED. should return an error of NFS4ERR_FHEXPIRED.
The mandatory attribute "fh_expire_type" is used by the client to The mandatory attribute "fh_expire_type" is used by the client to
determine what type of filehandle the server is providing for a determine what type of filehandle the server is providing for a
particular filesystem. This attribute is a bitmask with the particular filesystem. This attribute is a bitmask with the
following values: following values:
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
FH4_PERSISTENT FH4_PERSISTENT
The value of FH4_PERSISTENT is used to indicate a persistent The value of FH4_PERSISTENT is used to indicate a persistent
filehandle, which is valid until the object is removed from the filehandle, which is valid until the object is removed from the
filesystem. The server will not return NFS4ERR_FHEXPIRED for filesystem. The server will not return NFS4ERR_FHEXPIRED for
this filehandle. FH4_PERSISTENT is defined as a value in which this filehandle. FH4_PERSISTENT is defined as a value in which
none of the bits specified below are set. none of the bits specified below are set.
FH4_VOLATILE_ANY FH4_VOLATILE_ANY
The filehandle may expire at any time, except as specifically The filehandle may expire at any time, except as specifically
skipping to change at page 31, line 55 skipping to change at page 30, line 55
but not all filehandles upon migration (e.g. all but those that but not all filehandles upon migration (e.g. all but those that
are open), FH4_VOLATILE_ANY (in this case with are open), FH4_VOLATILE_ANY (in this case with
FH4_NOEXPIRE_WITH_OPEN) is a better choice since the client may FH4_NOEXPIRE_WITH_OPEN) is a better choice since the client may
not assume that all filehandles will expire when migration not assume that all filehandles will expire when migration
occurs, and it is likely that additional expirations will occur occurs, and it is likely that additional expirations will occur
(as a result of file CLOSE) that are separated in time from the (as a result of file CLOSE) that are separated in time from the
migration event itself. migration event itself.
4.2.4. One Method of Constructing a Volatile Filehandle 4.2.4. One Method of Constructing a Volatile Filehandle
As mentioned, in some instances a filehandle is stale (no longer
valid; perhaps because the file was removed from the server) or it is
expired (the underlying file is valid but since the filehandle is
Draft Specification NFS version 4 Protocol August 2002
volatile, it may have expired). Thus the server needs to be able to
return NFS4ERR_STALE in the former case and NFS4ERR_FHEXPIRED in the
latter case. This can be done by careful construction of the volatile
filehandle. One possible implementation follows.
A volatile filehandle, while opaque to the client could contain: A volatile filehandle, while opaque to the client could contain:
[volatile bit = 1 | server boot time | slot | generation number] [volatile bit = 1 | server boot time | slot | generation number]
Draft Specification NFS version 4 Protocol September 2002
o slot is an index in the server volatile filehandle table o slot is an index in the server volatile filehandle table
o generation number is the generation number for the table o generation number is the generation number for the table
entry/slot entry/slot
If the server boot time is less than the current server boot time, When the client presents a volatile filehandle, the server makes the
return NFS4ERR_FHEXPIRED. If slot is out of range, return following checks, which assume that the check for the volatile bit
has passed. If the server boot time is less than the current server
boot time, return NFS4ERR_FHEXPIRED. If slot is out of range, return
NFS4ERR_BADHANDLE. If the generation number does not match, return NFS4ERR_BADHANDLE. If the generation number does not match, return
NFS4ERR_FHEXPIRED. NFS4ERR_FHEXPIRED.
When the server reboots, the table is gone (it is volatile). When the server reboots, the table is gone (it is volatile).
If volatile bit is 0, then it is a persistent filehandle with a If volatile bit is 0, then it is a persistent filehandle with a
different structure following it. different structure following it.
4.3. Client Recovery from Filehandle Expiration 4.3. Client Recovery from Filehandle Expiration
skipping to change at page 33, line 4 skipping to change at page 31, line 50
from the filesystem, obviously the client will not be able to recover from the filesystem, obviously the client will not be able to recover
from the expired filehandle. from the expired filehandle.
It is also possible that the expired filehandle refers to a file that It is also possible that the expired filehandle refers to a file that
has been renamed. If the file was renamed by another client, again has been renamed. If the file was renamed by another client, again
it is possible that the original client will not be able to recover. it is possible that the original client will not be able to recover.
However, in the case that the client itself is renaming the file and However, in the case that the client itself is renaming the file and
the file is open, it is possible that the client may be able to the file is open, it is possible that the client may be able to
recover. The client can determine the new path name based on the recover. The client can determine the new path name based on the
processing of the rename request. The client can then regenerate the processing of the rename request. The client can then regenerate the
Draft Specification NFS version 4 Protocol August 2002
new filehandle based on the new path name. The client could also use new filehandle based on the new path name. The client could also use
the compound operation mechanism to construct a set of operations the compound operation mechanism to construct a set of operations
like: like:
RENAME A B RENAME A B
LOOKUP B LOOKUP B
GETFH GETFH
Note that the COMPOUND procedure does not provide atomicity. This Note that the COMPOUND procedure does not provide atomicity. This
example only reduces the overhead of recovering from an expired example only reduces the overhead of recovering from an expired
Draft Specification NFS version 4 Protocol September 2002
filehandle. filehandle.
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
5. File Attributes 5. File Attributes
To meet the requirements of extensibility and increased To meet the requirements of extensibility and increased
interoperability with non-UNIX platforms, attributes must be handled interoperability with non-UNIX platforms, attributes must be handled
in a flexible manner. The NFS version 3 fattr3 structure contains a in a flexible manner. The NFS version 3 fattr3 structure contains a
fixed list of attributes that not all clients and servers are able to fixed list of attributes that not all clients and servers are able to
support or care about. The fattr3 structure can not be extended as support or care about. The fattr3 structure can not be extended as
new needs arise and it provides no way to indicate non-support. With new needs arise and it provides no way to indicate non-support. With
the NFS version 4 protocol, the client is able query what attributes the NFS version 4 protocol, the client is able query what attributes
skipping to change at page 35, line 5 skipping to change at page 34, line 5
encouraged to define their new attributes as recommended attributes encouraged to define their new attributes as recommended attributes
by bringing them to the IETF standards-track process. by bringing them to the IETF standards-track process.
The set of attributes which are classified as mandatory is The set of attributes which are classified as mandatory is
deliberately small since servers must do whatever it takes to support deliberately small since servers must do whatever it takes to support
them. A server should support as many of the recommended attributes them. A server should support as many of the recommended attributes
as possible but by their definition, the server is not required to as possible but by their definition, the server is not required to
support all of them. Attributes are deemed mandatory if the data is support all of them. Attributes are deemed mandatory if the data is
both needed by a large number of clients and is not otherwise both needed by a large number of clients and is not otherwise
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
reasonably computable by the client when support is not provided on reasonably computable by the client when support is not provided on
the server. the server.
Note that the hidden directory returned by OPENATTR is a convenience Note that the hidden directory returned by OPENATTR is a convenience
for protocol processing. The client should not make any assumptions for protocol processing. The client should not make any assumptions
about the server's implementation of named attributes and whether the about the server's implementation of named attributes and whether the
underlying filesystem at the server has a named attribute directory underlying filesystem at the server has a named attribute directory
or not. Therefore, operations such as SETATTR and GETATTR on the or not. Therefore, operations such as SETATTR and GETATTR on the
named attribute directory are undefined. named attribute directory are undefined.
skipping to change at page 36, line 5 skipping to change at page 35, line 5
fabricate or construct an attribute or whether to do without the fabricate or construct an attribute or whether to do without the
attribute. attribute.
5.3. Named Attributes 5.3. Named Attributes
These attributes are not supported by direct encoding in the NFS These attributes are not supported by direct encoding in the NFS
Version 4 protocol but are accessed by string names rather than Version 4 protocol but are accessed by string names rather than
numbers and correspond to an uninterpreted stream of bytes which are numbers and correspond to an uninterpreted stream of bytes which are
stored with the filesystem object. The name space for these stored with the filesystem object. The name space for these
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
attributes may be accessed by using the OPENATTR operation. The attributes may be accessed by using the OPENATTR operation. The
OPENATTR operation returns a filehandle for a virtual "attribute OPENATTR operation returns a filehandle for a virtual "attribute
directory" and further perusal of the name space may be done using directory" and further perusal of the name space may be done using
READDIR and LOOKUP operations on this filehandle. Named attributes READDIR and LOOKUP operations on this filehandle. Named attributes
may then be examined or changed by normal READ and WRITE and CREATE may then be examined or changed by normal READ and WRITE and CREATE
operations on the filehandles returned from READDIR and LOOKUP. operations on the filehandles returned from READDIR and LOOKUP.
Named attributes may have attributes. Named attributes may have attributes.
It is recommended that servers support arbitrary named attributes. A It is recommended that servers support arbitrary named attributes. A
skipping to change at page 36, line 35 skipping to change at page 35, line 35
IETF standards track documents. See the section "IANA IETF standards track documents. See the section "IANA
Considerations" for further discussion. Considerations" for further discussion.
5.4. Classification of Attributes 5.4. Classification of Attributes
Each of the Mandatory and Recommended attributes can be classified in Each of the Mandatory and Recommended attributes can be classified in
one of three categories: per server, per filesystem, or per one of three categories: per server, per filesystem, or per
filesystem object. Note that it is possible that some per filesystem filesystem object. Note that it is possible that some per filesystem
attributes may vary within the filesystem. See the "homogeneous" attributes may vary within the filesystem. See the "homogeneous"
attribute for its definition. Note that the attributes attribute for its definition. Note that the attributes
time_access_set and time_modify_set are not listed below because they time_access_set and time_modify_set are not listed in this section
are write-only attributes used in a special instance of SETATTR. because they are write-only attributes corresponding to time_access
and time_modify, and are used in a special instance of SETATTR.
o The per server attribute is: o The per server attribute is:
lease_time lease_time
o The per filesystem attributes are: o The per filesystem attributes are:
supp_attr, fh_expire_type, link_support, symlink_support, supp_attr, fh_expire_type, link_support, symlink_support,
unique_handles, aclsupport, cansettime, case_insensitive, unique_handles, aclsupport, cansettime, case_insensitive,
case_preserving, chown_restricted, files_avail, files_free, case_preserving, chown_restricted, files_avail, files_free,
files_total, fs_locations, homogeneous, maxfilesize, maxname, files_total, fs_locations, homogeneous, maxfilesize, maxname,
maxread, maxwrite, no_trunc, space_avail, space_free, maxread, maxwrite, no_trunc, space_avail, space_free,
space_total, time_delta space_total, time_delta
o The per filesystem object attributes are: o The per filesystem object attributes are:
type, change, size, named_attr, fsid, rdattr_error, filehandle, type, change, size, named_attr, fsid, rdattr_error, filehandle,
ACL, archive, fileid, hidden, maxlink, mimetype, mode, numlinks, ACL, archive, fileid, hidden, maxlink, mimetype, mode, numlinks,
owner, owner_group, rawdev, space_used, system, time_access, owner, owner_group, rawdev, space_used, system, time_access,
time_backup, time_create, time_metadata, time_modify, time_backup, time_create, time_metadata, time_modify,
mounted_on_fileid
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
mounted_on_fileid
For quota_avail_hard, quota_avail_soft, and quota_used see their For quota_avail_hard, quota_avail_soft, and quota_used see their
definitions below for the appropriate classification. definitions below for the appropriate classification.
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
5.5. Mandatory Attributes - Definitions 5.5. Mandatory Attributes - Definitions
Name # DataType Access Description Name # DataType Access Description
___________________________________________________________________ ___________________________________________________________________
supp_attr 0 bitmap READ The bit vector which supp_attr 0 bitmap READ The bit vector which
would retrieve all would retrieve all
mandatory and mandatory and
recommended attributes recommended attributes
that are supported for that are supported for
skipping to change at page 38, line 53 skipping to change at page 37, line 53
object's time_metadata object's time_metadata
attribute for this attribute for this
attribute's value but attribute's value but
only if the filesystem only if the filesystem
object can not be object can not be
updated more updated more
frequently than the frequently than the
resolution of resolution of
time_metadata. time_metadata.
size 4 uint64 R/W size 4 uint64 R/W The size of the object
The size of the object
in bytes. in bytes.
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
link_support 5 bool READ True, if the object's link_support 5 bool READ True, if the object's
filesystem supports filesystem supports
hard links. hard links.
symlink_support 6 bool READ True, if the object's symlink_support 6 bool READ True, if the object's
filesystem supports filesystem supports
symbolic links. symbolic links.
named_attr 7 bool READ True, if this object named_attr 7 bool READ True, if this object
skipping to change at page 39, line 29 skipping to change at page 38, line 29
attribute directory. attribute directory.
fsid 8 fsid4 READ Unique filesystem fsid 8 fsid4 READ Unique filesystem
identifier for the identifier for the
filesystem holding filesystem holding
this object. fsid this object. fsid
contains major and contains major and
minor components each minor components each
of which are uint64. of which are uint64.
unique_handles 9 bool READ unique_handles 9 bool READ True, if two distinct
True, if two distinct
filehandles guaranteed filehandles guaranteed
to refer to two to refer to two
different filesystem different filesystem
objects. objects.
lease_time 10 nfs_lease4 READ Duration of leases at lease_time 10 nfs_lease4 READ Duration of leases at
server in seconds. server in seconds.
rdattr_error 11 enum READ Error returned from rdattr_error 11 enum READ Error returned from
getattr during getattr during
readdir. readdir.
filehandle 19 nfs_fh4 READ The filehandle of this filehandle 19 nfs_fh4 READ The filehandle of this
object (primarily for object (primarily for
readdir requests). readdir requests).
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
5.6. Recommended Attributes - Definitions 5.6. Recommended Attributes - Definitions
Name # Data Type Access Description Name # Data Type Access Description
______________________________________________________________________ ______________________________________________________________________
ACL 12 nfsace4<> R/W The access control ACL 12 nfsace4<> R/W The access control
list for the object. list for the object.
aclsupport 13 uint32 READ Indicates what types aclsupport 13 uint32 READ Indicates what types
of ACLs are supported of ACLs are supported
skipping to change at page 40, line 35 skipping to change at page 39, line 35
cansettime 15 bool READ True, if the server cansettime 15 bool READ True, if the server
able to change the able to change the
times for a times for a
filesystem object as filesystem object as
specified in a specified in a
SETATTR operation. SETATTR operation.
case_insensitive 16 bool READ True, if filename case_insensitive 16 bool READ True, if filename
comparisons on this comparisons on this
filesystem case filesystem are case
insensitive. insensitive.
case_preserving 17 bool READ True, if filename case_preserving 17 bool READ True, if filename
case on this case on this
filesystem preserved. filesystem are
preserved.
chown_restricted 18 bool READ If TRUE, the server chown_restricted 18 bool READ If TRUE, the server
will reject any will reject any
request to change request to change
either the owner or either the owner or
the group associated the group associated
with a file if the with a file if the
caller is not a caller is not a
privileged user (for privileged user (for
example, "root" in example, "root" in
UNIX operating UNIX operating
environments or in environments or in
Windows 2000 the Windows 2000 the
"Take Ownership" "Take Ownership"
privilege). privilege).
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
fileid 20 uint64 READ A number uniquely fileid 20 uint64 READ A number uniquely
identifying the file identifying the file
within the within the
filesystem. filesystem.
files_avail 21 uint64 READ File slots available files_avail 21 uint64 READ File slots available
to this user on the to this user on the
filesystem containing filesystem containing
this object - this this object - this
skipping to change at page 42, line 5 skipping to change at page 41, line 5
are per filesystem are per filesystem
attributes the same attributes the same
for all filesystem's for all filesystem's
objects. objects.
maxfilesize 27 uint64 READ Maximum supported maxfilesize 27 uint64 READ Maximum supported
file size for the file size for the
filesystem of this filesystem of this
object. object.
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
maxlink 28 uint32 READ Maximum number of maxlink 28 uint32 READ Maximum number of
links for this links for this
object. object.
maxname 29 uint32 READ Maximum filename size maxname 29 uint32 READ Maximum filename size
supported for this supported for this
object. object.
maxread 30 uint64 READ Maximum read size maxread 30 uint64 READ Maximum read size
supported for this supported for this
object. object.
maxwrite 31 uint64 READ maxwrite 31 uint64 READ Maximum write size
Maximum write size
supported for this supported for this
object. This object. This
attribute SHOULD be attribute SHOULD be
supported if the file supported if the file
is writable. Lack of is writable. Lack of
this attribute can this attribute can
lead to the client lead to the client
either wasting either wasting
bandwidth or not bandwidth or not
receiving the best receiving the best
skipping to change at page 43, line 5 skipping to change at page 42, line 5
to this object. to this object.
owner 36 utf8<> R/W The string name of owner 36 utf8<> R/W The string name of
the owner of this the owner of this
object. object.
owner_group 37 utf8<> R/W The string name of owner_group 37 utf8<> R/W The string name of
the group ownership the group ownership
of this object. of this object.
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
quota_avail_hard 38 uint64 READ For definition see quota_avail_hard 38 uint64 READ For definition see
"Quota Attributes" "Quota Attributes"
section below. section below.
quota_avail_soft 39 uint64 READ For definition see quota_avail_soft 39 uint64 READ For definition see
"Quota Attributes" "Quota Attributes"
section below. section below.
quota_used 40 uint64 READ For definition see quota_used 40 uint64 READ For definition see
skipping to change at page 44, line 5 skipping to change at page 43, line 5
space_total 44 uint64 READ Total disk space in space_total 44 uint64 READ Total disk space in
bytes on the bytes on the
filesystem containing filesystem containing
this object. this object.
space_used 45 uint64 READ Number of filesystem space_used 45 uint64 READ Number of filesystem
bytes allocated to bytes allocated to
this object. this object.
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
system 46 bool R/W True, if this file is system 46 bool R/W True, if this file is
a "system" file with a "system" file with
respect to the respect to the
Windows API? Windows API?
time_access 47 nfstime4 READ The time of last time_access 47 nfstime4 READ The time of last
access to the object access to the object
by a read that was by a read that was
satisfied by the satisfied by the
server. server.
time_access_set 48 settime4 WRITE Set the time of last time_access_set 48 settime4 WRITE Set the time of last
access to the object. access to the object.
SETATTR use only. SETATTR use only.
time_backup 49 nfstime4 R/W The time of last time_backup 49 nfstime4 R/W The time of last
backup of the object. backup of the object.
time_create 50 nfstime4 R/W time_create 50 nfstime4 R/W The time of creation
The time of creation
of the object. This of the object. This
attribute does not attribute does not
have any relation to have any relation to
the traditional UNIX the traditional UNIX
file attribute file attribute
"ctime" or "change "ctime" or "change
time". time".
time_delta 51 nfstime4 READ Smallest useful time_delta 51 nfstime4 READ Smallest useful
server time server time
granularity. granularity.
time_metadata 52 nfstime4 R/W The time of last time_metadata 52 nfstime4 READ The time of last
meta-data meta-data
modification of the modification of the
object. object.
time_modify 53 nfstime4 READ The time of last time_modify 53 nfstime4 READ The time of last
modification to the modification to the
object. object.
time_modify_set 54 settime4 WRITE Set the time of last time_modify_set 54 settime4 WRITE Set the time of last
modification to the modification to the
object. SETATTR use object. SETATTR use
only. only.
mounted_on_fileid 55 uint64 READ Like fileid, but if mounted_on_fileid 55 uint64 READ Like fileid, but if
the target filehandle the target filehandle
is the root of a is the root of a
filesystem return the filesystem return the
fileid of the fileid of the
underlying directory. underlying directory.
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
5.7. Time Access 5.7. Time Access
As defined above, the time_access attribute represents the time of As defined above, the time_access attribute represents the time of
last access to the object by a read that was satisfied by the server. last access to the object by a read that was satisfied by the server.
The notion of what is an "access" depends on server's operating The notion of what is an "access" depends on server's operating
environment and/or the server's filesystem semantics. For example, environment and/or the server's filesystem semantics. For example,
for servers obeying POSIX semantics, time_access would be updated for servers obeying POSIX semantics, time_access would be updated
only by the READLINK, READ, and READDIR operations and not any of the only by the READLINK, READ, and READDIR operations and not any of the
operations that modify the content of the object. Of course, setting operations that modify the content of the object. Of course, setting
the corresponding time_access_set attribute is another way to modify the corresponding time_access_set attribute is another way to modify
the time_access attribute. the time_access attribute.
Whenever the file object resides on a writeable filesystem, the Whenever the file object resides on a writable filesystem, the server
server should make best efforts to record time_access into stable should make best efforts to record time_access into stable storage.
storage. However, to mitigate the performance effects of doing so, However, to mitigate the performance effects of doing so, and most
and most especially whenever the server is satisifying the read of especially whenever the server is satisfying the read of the object's
the object's content from its cache, the server MAY cache access time content from its cache, the server MAY cache access time updates and
updates and lazily write them to stable storage. It is also lazily write them to stable storage. It is also acceptable to give
acceptable to give administrators of the server the option to disable administrators of the server the option to disable time_access
time_access updates. updates.
5.8. Interpreting owner and owner_group 5.8. Interpreting owner and owner_group
The recommended attributes "owner" and "owner_group" (and also users The recommended attributes "owner" and "owner_group" (and also users
and groups within the "acl" attribute) are represented in terms of a and groups within the "acl" attribute) are represented in terms of a
UTF-8 string. To avoid a representation that is tied to a particular UTF-8 string. To avoid a representation that is tied to a particular
underlying implementation at the client or server, the use of the underlying implementation at the client or server, the use of the
UTF-8 string has been chosen. Note that section 6.1 of [RFC2624] UTF-8 string has been chosen. Note that section 6.1 of [RFC2624]
provides additional rationale. It is expected that the client and provides additional rationale. It is expected that the client and
server will have their own local representation of owner and server will have their own local representation of owner and
skipping to change at page 46, line 5 skipping to change at page 45, line 5
to these security principals. When these local identifiers are to these security principals. When these local identifiers are
translated to the form of the owner attribute, associated with files translated to the form of the owner attribute, associated with files
created by such principals they identify, in a common format, the created by such principals they identify, in a common format, the
users associated with each corresponding set of security principals. users associated with each corresponding set of security principals.
The translation used to interpret owner and group strings is not The translation used to interpret owner and group strings is not
specified as part of the protocol. This allows various solutions to specified as part of the protocol. This allows various solutions to
be employed. For example, a local translation table may be consulted be employed. For example, a local translation table may be consulted
that maps between a numeric id to the user@dns_domain syntax. A name that maps between a numeric id to the user@dns_domain syntax. A name
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
service may also be used to accomplish the translation. A server may service may also be used to accomplish the translation. A server may
provide a more general service, not limited by any particular provide a more general service, not limited by any particular
translation (which would only translate a limited set of possible translation (which would only translate a limited set of possible
strings) by storing the owner and owner_group attributes in local strings) by storing the owner and owner_group attributes in local
storage without any translation or it may augment a translation storage without any translation or it may augment a translation
method by storing the entire string for attributes for which no method by storing the entire string for attributes for which no
translation is available while using the local representation for translation is available while using the local representation for
those cases in which a translation is available. those cases in which a translation is available.
skipping to change at page 47, line 5 skipping to change at page 46, line 5
unsigned uid's and gid's, owner and group strings that consist of unsigned uid's and gid's, owner and group strings that consist of
decimal numeric values with no leading zeros can be given a special decimal numeric values with no leading zeros can be given a special
interpretation by clients and servers which choose to provide such interpretation by clients and servers which choose to provide such
support. The receiver may treat such a user or group string as support. The receiver may treat such a user or group string as
representing the same user as would be represented by a v2/v3 uid or representing the same user as would be represented by a v2/v3 uid or
gid having the corresponding numeric value. A server is not gid having the corresponding numeric value. A server is not
obligated to accept such a string, but may return an NFS4ERR_BADOWNER obligated to accept such a string, but may return an NFS4ERR_BADOWNER
instead. To avoid this mechanism being used to subvert user and instead. To avoid this mechanism being used to subvert user and
group translation, so that a client might pass all of the owners and group translation, so that a client might pass all of the owners and
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
groups in numeric form, a server SHOULD return an NFS4ERR_BADOWNER groups in numeric form, a server SHOULD return an NFS4ERR_BADOWNER
error when there is a valid translation for the user or owner error when there is a valid translation for the user or owner
designated in this way. In that case, the client must use the designated in this way. In that case, the client must use the
appropriate name@domain string and not the special form for appropriate name@domain string and not the special form for
compatibility. compatibility.
The owner string "nobody" may be used to designate an anonymous user, The owner string "nobody" may be used to designate an anonymous user,
which will be associated with a file created by a security principal which will be associated with a file created by a security principal
that cannot be mapped through normal means to the owner attribute. that cannot be mapped through normal means to the owner attribute.
skipping to change at page 48, line 5 skipping to change at page 47, line 5
allocations to other files or directories. allocations to other files or directories.
quota_used quota_used
The value in bytes which represent the amount of disc space used The value in bytes which represent the amount of disc space used
by this file or directory and possibly a number of other similar by this file or directory and possibly a number of other similar
files or directories, where the set of "similar" meets at least files or directories, where the set of "similar" meets at least
the criterion that allocating space to any file or directory in the criterion that allocating space to any file or directory in
the set will reduce the "quota_avail_hard" of every other file the set will reduce the "quota_avail_hard" of every other file
or directory in the set. or directory in the set.
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
Note that there may be a number of distinct but overlapping sets Note that there may be a number of distinct but overlapping sets
of files or directories for which a quota_used value is of files or directories for which a quota_used value is
maintained. E.g. "all files with a given owner", "all files with maintained. E.g. "all files with a given owner", "all files with
a given group owner". etc. a given group owner". etc.
The server is at liberty to choose any of those sets but should The server is at liberty to choose any of those sets but should
do so in a repeatable way. The rule may be configured per- do so in a repeatable way. The rule may be configured per-
filesystem or may be "choose the set with the smallest quota". filesystem or may be "choose the set with the smallest quota".
skipping to change at page 48, line 49 skipping to change at page 47, line 49
To determine if a request succeeds, each nfsace4 entry is processed To determine if a request succeeds, each nfsace4 entry is processed
in order by the server. Only ACEs which have a "who" that matches in order by the server. Only ACEs which have a "who" that matches
the requester are considered. Each ACE is processed until all of the the requester are considered. Each ACE is processed until all of the
bits of the requester's access have been ALLOWED. Once a bit (see bits of the requester's access have been ALLOWED. Once a bit (see
below) has been ALLOWED by an ACCESS_ALLOWED_ACE, it is no longer below) has been ALLOWED by an ACCESS_ALLOWED_ACE, it is no longer
considered in the processing of later ACEs. If an ACCESS_DENIED_ACE considered in the processing of later ACEs. If an ACCESS_DENIED_ACE
is encountered where the requester's access still has unALLOWED bits is encountered where the requester's access still has unALLOWED bits
in common with the "access_mask" of the ACE, the request is denied. in common with the "access_mask" of the ACE, the request is denied.
However, unlike the ALLOWED and DENIED ACE types, the ALARM and AUDIT However, unlike the ALLOWED and DENIED ACE types, the ALARM and AUDIT
ACE types do not affect a requestor's access, and instead are for ACE types do not affect a requester's access, and instead are for
triggering events as a result of a requestor's access attempt. triggering events as a result of a requester's access attempt.
Therefore, all AUDIT and ALARM ACEs are processed until end of the Therefore, all AUDIT and ALARM ACEs are processed until end of the
ACL. ACL. When the ACL is fully processed, if there are bits in
requester's mask that have not been considered whether the server
allows or denies the access is undefined. If there is a mode
attribute on the file, then this cannot happen, since the mode's
The NFS version 4 ACL model is quite rich. Some server platforms may Draft Specification NFS version 4 Protocol September 2002
provide access control functionality that goes beyond the UNIX-style
Draft Specification NFS version 4 Protocol August 2002 MODE4_*OTH bits will map to EVERYONE@ ACEs that unambiguously specify
the requester's access.
The NFS version 4 ACL model is quite rich. Some server platforms may
provide access control functionality that goes beyond the UNIX-style
mode attribute, but which is not as rich as the NFS ACL model. So mode attribute, but which is not as rich as the NFS ACL model. So
that users can take advantage of this more limited functionality, the that users can take advantage of this more limited functionality, the
server may indicate that it supports ACLs as long as it follows the server may indicate that it supports ACLs as long as it follows the
guidelines for mapping between its ACL model and the NFS version 4 guidelines for mapping between its ACL model and the NFS version 4
ACL model. ACL model.
The situation is complicated by the fact that a server may have The situation is complicated by the fact that a server may have
multiple modules that enforce ACLs. For example, the enforcement for multiple modules that enforce ACLs. For example, the enforcement for
NFS version 4 access may be different from the enforcement for local NFS version 4 access may be different from the enforcement for local
access, and both may be different from the enforcement for access access, and both may be different from the enforcement for access
skipping to change at page 49, line 50 skipping to change at page 49, line 4
dependent) when any access attempt is dependent) when any access attempt is
made to a file or directory for the made to a file or directory for the
access methods specified in acemask4. access methods specified in acemask4.
A server need not support all of the above ACE types. The bitmask A server need not support all of the above ACE types. The bitmask
constants used to represent the above definitions within the constants used to represent the above definitions within the
aclsupport attribute are as follows: aclsupport attribute are as follows:
const ACL4_SUPPORT_ALLOW_ACL = 0x00000001; const ACL4_SUPPORT_ALLOW_ACL = 0x00000001;
const ACL4_SUPPORT_DENY_ACL = 0x00000002; const ACL4_SUPPORT_DENY_ACL = 0x00000002;
Draft Specification NFS version 4 Protocol September 2002
const ACL4_SUPPORT_AUDIT_ACL = 0x00000004; const ACL4_SUPPORT_AUDIT_ACL = 0x00000004;
const ACL4_SUPPORT_ALARM_ACL = 0x00000008; const ACL4_SUPPORT_ALARM_ACL = 0x00000008;
The semantics of the "type" field follow the descriptions provided The semantics of the "type" field follow the descriptions provided
Draft Specification NFS version 4 Protocol August 2002
above. above.
The constants used for the type field (acetype4) are as follows: The constants used for the type field (acetype4) are as follows:
const ACE4_ACCESS_ALLOWED_ACE_TYPE = 0x00000000; const ACE4_ACCESS_ALLOWED_ACE_TYPE = 0x00000000;
const ACE4_ACCESS_DENIED_ACE_TYPE = 0x00000001; const ACE4_ACCESS_DENIED_ACE_TYPE = 0x00000001;
const ACE4_SYSTEM_AUDIT_ACE_TYPE = 0x00000002; const ACE4_SYSTEM_AUDIT_ACE_TYPE = 0x00000002;
const ACE4_SYSTEM_ALARM_ACE_TYPE = 0x00000003; const ACE4_SYSTEM_ALARM_ACE_TYPE = 0x00000003;
Clients should not attempt to set an ACE unless the server claims Clients should not attempt to set an ACE unless the server claims
support for that ACE type. If the server receives a request to set support for that ACE type. If the server receives a request to set
an ACE that it cannot store, it must reject the request with an ACE that it cannot store, it MUST reject the request with
NFS4ERR_ATTRNOTSUPP. NFS4ERR_ATTRNOTSUPP. If the server receives a request to set an ACE
that it can store but cannot enforce, the server SHOULD reject the
If the server receives a request to set an ACE that it can store but request with NFS4ERR_ATTRNOTSUPP.
cannot enforce, the server SHOULD reject the request.
Example: suppose a server can enforce NFS ACLs for NFS access but Example: suppose a server can enforce NFS ACLs for NFS access but
cannot enforce ACLs for local access. If arbitrary processes can run cannot enforce ACLs for local access. If arbitrary processes can run
on the server, then the server SHOULD NOT indicate ACL support. On on the server, then the server SHOULD NOT indicate ACL support. On
the other hand, if only trusted administrative programs run locally, the other hand, if only trusted administrative programs run locally,
then the server may indicate ACL support. then the server may indicate ACL support.
5.11.2. ACE Access Mask 5.11.2. ACE Access Mask
The access_mask field contains values based on the following: The access_mask field contains values based on the following:
skipping to change at page 50, line 50 skipping to change at page 50, line 4
ADD_FILE Permission to add a new file to a ADD_FILE Permission to add a new file to a
directory directory
APPEND_DATA Permission to append data to a file APPEND_DATA Permission to append data to a file
ADD_SUBDIRECTORY Permission to create a subdirectory to a ADD_SUBDIRECTORY Permission to create a subdirectory to a
directory directory
READ_NAMED_ATTRS Permission to read the named attributes READ_NAMED_ATTRS Permission to read the named attributes
of a file of a file
WRITE_NAMED_ATTRS Permission to write the named attributes WRITE_NAMED_ATTRS Permission to write the named attributes
of a file of a file
EXECUTE Permission to execute a file EXECUTE Permission to execute a file
Draft Specification NFS version 4 Protocol September 2002
DELETE_CHILD Permission to delete a file or directory DELETE_CHILD Permission to delete a file or directory
within a directory within a directory
READ_ATTRIBUTES The ability to read basic attributes READ_ATTRIBUTES The ability to read basic attributes
(non-acls) of a file (non-acls) of a file
Draft Specification NFS version 4 Protocol August 2002
WRITE_ATTRIBUTES Permission to change basic attributes WRITE_ATTRIBUTES Permission to change basic attributes
(non-acls) of a file (non-acls) of a file
DELETE Permission to Delete the file DELETE Permission to Delete the file
READ_ACL Permission to Read the ACL READ_ACL Permission to Read the ACL
WRITE_ACL Permission to Write the ACL WRITE_ACL Permission to Write the ACL
WRITE_OWNER Permission to change the owner WRITE_OWNER Permission to change the owner
SYNCHRONIZE Permission to access file locally at the SYNCHRONIZE Permission to access file locally at the
server with synchronous reads and writes server with synchronous reads and writes
skipping to change at page 51, line 54 skipping to change at page 51, line 4
enabled. enabled.
If a server receives a SETATTR request that it cannot accurately If a server receives a SETATTR request that it cannot accurately
implement, it should error in the direction of more restricted implement, it should error in the direction of more restricted
access. For example, suppose a server cannot distinguish overwriting access. For example, suppose a server cannot distinguish overwriting
data from appending new data, as described in the previous paragraph. data from appending new data, as described in the previous paragraph.
If a client submits an ACE where APPEND_DATA is set but WRITE_DATA is If a client submits an ACE where APPEND_DATA is set but WRITE_DATA is
not (or vice versa), the server should reject the request with not (or vice versa), the server should reject the request with
NFS4ERR_ATTRNOTSUPP. Nonetheless, if the ACE has type DENY, the NFS4ERR_ATTRNOTSUPP. Nonetheless, if the ACE has type DENY, the
server may silently turn on the other bit, so that both APPEND_DATA server may silently turn on the other bit, so that both APPEND_DATA
and WRITE_DATA are denied.
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
and WRITE_DATA are denied.
5.11.3. ACE flag 5.11.3. ACE flag
The "flag" field contains values based on the following descriptions. The "flag" field contains values based on the following descriptions.
ACE4_FILE_INHERIT_ACE ACE4_FILE_INHERIT_ACE
Can be placed on a directory and indicates that this ACE should be Can be placed on a directory and indicates that this ACE should be
added to each new non-directory file created. added to each new non-directory file created.
skipping to change at page 52, line 46 skipping to change at page 51, line 48
ACE4_SUCCESSFUL_ACCESS_ACE_FLAG ACE4_SUCCESSFUL_ACCESS_ACE_FLAG
ACL4_FAILED_ACCESS_ACE_FLAG ACL4_FAILED_ACCESS_ACE_FLAG
The ACE4_SUCCESSFUL_ACCESS_ACE_FLAG (SUCCESS) and The ACE4_SUCCESSFUL_ACCESS_ACE_FLAG (SUCCESS) and
ACE4_FAILED_ACCESS_ACE_FLAG (FAILED) flag bits relate only to ACE4_FAILED_ACCESS_ACE_FLAG (FAILED) flag bits relate only to
ACE4_SYSTEM_AUDIT_ACE_TYPE (AUDIT) and ACE4_SYSTEM_ALARM_ACE_TYPE ACE4_SYSTEM_AUDIT_ACE_TYPE (AUDIT) and ACE4_SYSTEM_ALARM_ACE_TYPE
(ALARM) ACE types. If during the processing of the file's ACL, the (ALARM) ACE types. If during the processing of the file's ACL, the
server encounters an AUDIT or ALARM ACE that matches the principal server encounters an AUDIT or ALARM ACE that matches the principal
attempting the OPEN, the server notes that fact, and the prescence, attempting the OPEN, the server notes that fact, and the presence, if
if any, of the SUCCESS and FAILED flags encountered in the AUDIT or any, of the SUCCESS and FAILED flags encountered in the AUDIT or
ALARM ACE. Once the server completes the ACL processing, and the ALARM ACE. Once the server completes the ACL processing, and the
share reservation processing, and the OPEN call, it then notes if the share reservation processing, and the OPEN call, it then notes if the
OPEN succeeded or failed. If the OPEN succeeded, and if the SUCCESS OPEN succeeded or failed. If the OPEN succeeded, and if the SUCCESS
Draft Specification NFS version 4 Protocol September 2002
flag was set for a matching AUDIT or ALARM, then the appropriate flag was set for a matching AUDIT or ALARM, then the appropriate
AUDIT or ALARM event occurs. If the OPEN failed, and if the FAILED AUDIT or ALARM event occurs. If the OPEN failed, and if the FAILED
flag was set for the matching AUDIT or ALARM, then the appropriate flag was set for the matching AUDIT or ALARM, then the appropriate
Draft Specification NFS version 4 Protocol August 2002
AUDIT or ALARM event occurs. Clearly either or both of the SUCCESS AUDIT or ALARM event occurs. Clearly either or both of the SUCCESS
or FAILED can be set, but if neither is set, the AUDIT or ALARM ACE or FAILED can be set, but if neither is set, the AUDIT or ALARM ACE
is not useful. is not useful.
The previously described processing applies to that of the ACCESS The previously described processing applies to that of the ACCESS
operation as well. The difference being that "success" or "failure" operation as well. The difference being that "success" or "failure"
does not mean whether ACCESS returns NFS4_OK or not. Success means does not mean whether ACCESS returns NFS4_OK or not. Success means
whether ACCESS returns all requested and supported bits. Failure whether ACCESS returns all requested and supported bits. Failure
means whether ACCESS failed to return a bit that was requested and means whether ACCESS failed to return a bit that was requested and
supported. supported.
skipping to change at page 53, line 52 skipping to change at page 53, line 4
should reject the request with NFS4ERR_ATTRNOTSUPP. If the server should reject the request with NFS4ERR_ATTRNOTSUPP. If the server
supports a single "inherit ACE" flag that applies to both files and supports a single "inherit ACE" flag that applies to both files and
directories, the server may reject the request (i.e., requiring the directories, the server may reject the request (i.e., requiring the
client to set both the file and directory inheritance flags). The client to set both the file and directory inheritance flags). The
server may also accept the request and silently turn on the server may also accept the request and silently turn on the
ACE4_DIRECTORY_INHERIT_ACE flag. ACE4_DIRECTORY_INHERIT_ACE flag.
5.11.4. ACE who 5.11.4. ACE who
There are several special identifiers ("who") which need to be There are several special identifiers ("who") which need to be
Draft Specification NFS version 4 Protocol September 2002
understood universally, rather than in the context of a particular understood universally, rather than in the context of a particular
DNS domain. Some of these identifiers cannot be understood when an DNS domain. Some of these identifiers cannot be understood when an
NFS client accesses the server, but have meaning when a local process NFS client accesses the server, but have meaning when a local process
Draft Specification NFS version 4 Protocol August 2002
accesses the file. The ability to display and modify these accesses the file. The ability to display and modify these
permissions is permitted over NFS, even if none of the access methods permissions is permitted over NFS, even if none of the access methods
on the server understands the identifiers. on the server understands the identifiers.
Who Description Who Description
_______________________________________________________________ _______________________________________________________________
"OWNER" The owner of the file. "OWNER" The owner of the file.
"GROUP" The group associated with the file. "GROUP" The group associated with the file.
"EVERYONE" The world. "EVERYONE" The world.
"INTERACTIVE" Accessed from an interactive terminal. "INTERACTIVE" Accessed from an interactive terminal.
skipping to change at page 54, line 52 skipping to change at page 54, line 4
const MODE4_XGRP = 0x008; /* execute permission: group */ const MODE4_XGRP = 0x008; /* execute permission: group */
const MODE4_ROTH = 0x004; /* read permission: other */ const MODE4_ROTH = 0x004; /* read permission: other */
const MODE4_WOTH = 0x002; /* write permission: other */ const MODE4_WOTH = 0x002; /* write permission: other */
const MODE4_XOTH = 0x001; /* execute permission: other */ const MODE4_XOTH = 0x001; /* execute permission: other */
Bits MODE4_RUSR, MODE4_WUSR, and MODE4_XUSR apply to the principal Bits MODE4_RUSR, MODE4_WUSR, and MODE4_XUSR apply to the principal
identified in the owner attribute. Bits MODE4_RGRP, MODE4_WGRP, and identified in the owner attribute. Bits MODE4_RGRP, MODE4_WGRP, and
MODE4_XGRP apply to the principals identified in the owner_group MODE4_XGRP apply to the principals identified in the owner_group
attribute. Bits MODE4_ROTH, MODE4_WOTH, MODE4_XOTH apply to any attribute. Bits MODE4_ROTH, MODE4_WOTH, MODE4_XOTH apply to any
principal that does not match that in the owner group, and does not principal that does not match that in the owner group, and does not
have a group matching that of the owner_group attribute.
The remaining bits are not defined by this protocol and MUST NOT be Draft Specification NFS version 4 Protocol September 2002
Draft Specification NFS version 4 Protocol August 2002 have a group matching that of the owner_group attribute.
The remaining bits are not defined by this protocol and MUST NOT be
used. The minor version mechanism must be used to define further bit used. The minor version mechanism must be used to define further bit
usage. usage.
Note that in UNIX, if a file has the MODE4_SGID bit set and no Note that in UNIX, if a file has the MODE4_SGID bit set and no
MODE4_XGRP bit set, then READ and WRITE must use mandatory file MODE4_XGRP bit set, then READ and WRITE must use mandatory file
locking. locking.
5.11.6. Mode and ACL Attribute 5.11.6. Mode and ACL Attribute
The server that supports both mode and ACL must take care to The server that supports both mode and ACL must take care to
skipping to change at page 55, line 55 skipping to change at page 55, line 4
mounted on the mount point. mounted on the mount point.
Unlike NFS version 3, NFS version 4 allows a client's LOOKUP request Unlike NFS version 3, NFS version 4 allows a client's LOOKUP request
to cross other filesystems. The client detects the filesystem to cross other filesystems. The client detects the filesystem
crossing whenever the filehandle argument of LOOKUP has an fsid crossing whenever the filehandle argument of LOOKUP has an fsid
attribute different from that of the filehandle returned by LOOKUP. A attribute different from that of the filehandle returned by LOOKUP. A
UNIX-based client will consider this a "mount point crossing". UNIX UNIX-based client will consider this a "mount point crossing". UNIX
has a legacy scheme for allowing a process to determine its current has a legacy scheme for allowing a process to determine its current
working directory. This relies on readdir() of a mount point's parent working directory. This relies on readdir() of a mount point's parent
and stat() of the mount point returning fileids as previously and stat() of the mount point returning fileids as previously
Draft Specification NFS version 4 Protocol September 2002
described. The mounted_on_fileid attribute corresponds to the fileid described. The mounted_on_fileid attribute corresponds to the fileid
that readdir() would have returned as described previously. that readdir() would have returned as described previously.
Draft Specification NFS version 4 Protocol August 2002
While the NFS version 4 client could simply fabricate a fileid While the NFS version 4 client could simply fabricate a fileid
corresponding to what mounted_on_fileid provides (and if the server corresponding to what mounted_on_fileid provides (and if the server
does not support mounted_on_fileid, the client has no choice), there does not support mounted_on_fileid, the client has no choice), there
is a risk that the client will generate a fileid that conflicts with is a risk that the client will generate a fileid that conflicts with
one that is already assigned to another object in the filesystem. one that is already assigned to another object in the filesystem.
Instead, if the server can provide the mounted_on_fileid, the Instead, if the server can provide the mounted_on_fileid, the
potential for client operational problems in this area is eliminated. potential for client operational problems in this area is eliminated.
If the server detects that there is no mounted point at the target If the server detects that there is no mounted point at the target
file object, then the value for mounted_on_fileid that it returns is file object, then the value for mounted_on_fileid that it returns is
skipping to change at page 57, line 5 skipping to change at page 56, line 5
fileid of a directory entry returned by readdir(). If fileid of a directory entry returned by readdir(). If
mounted_on_fileid is requested in a GETATTR operation, the server mounted_on_fileid is requested in a GETATTR operation, the server
should obey an invariant that has it returning a value that is equal should obey an invariant that has it returning a value that is equal
to the file object's entry in the object's parent directory, i.e. to the file object's entry in the object's parent directory, i.e.
what readdir() would have returned. Some operating environments what readdir() would have returned. Some operating environments
allow a series of two or more filesystems to be mounted onto a single allow a series of two or more filesystems to be mounted onto a single
mount point. In this case, for the server to obey the aforementioned mount point. In this case, for the server to obey the aforementioned
invariant, it will need to find the base mount point, and not the invariant, it will need to find the base mount point, and not the
intermediate mount points. intermediate mount points.
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
6. Filesystem Migration and Replication 6. Filesystem Migration and Replication
With the use of the recommended attribute "fs_locations", the NFS With the use of the recommended attribute "fs_locations", the NFS
version 4 server has a method of providing filesystem migration or version 4 server has a method of providing filesystem migration or
replication services. For the purposes of migration and replication, replication services. For the purposes of migration and replication,
a filesystem will be defined as all files that share a given fsid a filesystem will be defined as all files that share a given fsid
(both major and minor values are the same). (both major and minor values are the same).
The fs_locations attribute provides a list of filesystem locations. The fs_locations attribute provides a list of filesystem locations.
skipping to change at page 58, line 5 skipping to change at page 57, line 5
Once the servers participating in the migration have completed the Once the servers participating in the migration have completed the
move of the filesystem, the error NFS4ERR_MOVED will be returned for move of the filesystem, the error NFS4ERR_MOVED will be returned for
subsequent requests received by the original server. The subsequent requests received by the original server. The
NFS4ERR_MOVED error is returned for all operations except PUTFH and NFS4ERR_MOVED error is returned for all operations except PUTFH and
GETATTR. Upon receiving the NFS4ERR_MOVED error, the client will GETATTR. Upon receiving the NFS4ERR_MOVED error, the client will
obtain the value of the fs_locations attribute. The client will then obtain the value of the fs_locations attribute. The client will then
use the contents of the attribute to redirect its requests to the use the contents of the attribute to redirect its requests to the
specified server. To facilitate the use of GETATTR, operations such specified server. To facilitate the use of GETATTR, operations such
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
as PUTFH must also be accepted by the server for the migrated file as PUTFH must also be accepted by the server for the migrated file
system's filehandles. Note that if the server returns NFS4ERR_MOVED, system's filehandles. Note that if the server returns NFS4ERR_MOVED,
the server MUST support the fs_locations attribute. the server MUST support the fs_locations attribute.
If the client requests more attributes than just fs_locations, the If the client requests more attributes than just fs_locations, the
server may return fs_locations only. This is to be expected since server may return fs_locations only. This is to be expected since
the server has migrated the filesystem and may not have a method of the server has migrated the filesystem and may not have a method of
obtaining additional attribute data. obtaining additional attribute data.
skipping to change at page 59, line 5 skipping to change at page 58, line 5
The fs_locations struct and attribute then contains an array of The fs_locations struct and attribute then contains an array of
locations. Since the name space of each server may be constructed locations. Since the name space of each server may be constructed
differently, the "fs_root" field is provided. The path represented differently, the "fs_root" field is provided. The path represented
by fs_root represents the location of the filesystem in the server's by fs_root represents the location of the filesystem in the server's
name space. Therefore, the fs_root path is only associated with the name space. Therefore, the fs_root path is only associated with the
server from which the fs_locations attribute was obtained. The server from which the fs_locations attribute was obtained. The
fs_root path is meant to aid the client in locating the filesystem at fs_root path is meant to aid the client in locating the filesystem at
the various servers listed. the various servers listed.
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
As an example, there is a replicated filesystem located at two As an example, there is a replicated filesystem located at two
servers (servA and servB). At servA the filesystem is located at servers (servA and servB). At servA the filesystem is located at
path "/a/b/c". At servB the filesystem is located at path "/x/y/z". path "/a/b/c". At servB the filesystem is located at path "/x/y/z".
In this example the client accesses the filesystem first at servA In this example the client accesses the filesystem first at servA
with a multi-component lookup path of "/a/b/c/d". Since the client with a multi-component lookup path of "/a/b/c/d". Since the client
used a multi-component lookup to obtain the filehandle at "/a/b/c/d", used a multi-component lookup to obtain the filehandle at "/a/b/c/d",
it is unaware that the filesystem's root is located in servA's name it is unaware that the filesystem's root is located in servA's name
space at "/a/b/c". When the client switches to servB, it will need space at "/a/b/c". When the client switches to servB, it will need
to determine that the directory it first referenced at servA is now to determine that the directory it first referenced at servA is now
skipping to change at page 60, line 5 skipping to change at page 59, line 5
of the fh_expire_type attribute, whether volatile filehandles will of the fh_expire_type attribute, whether volatile filehandles will
expire at the migration or replication event. If the bit expire at the migration or replication event. If the bit
FH4_VOL_MIGRATION is set in the fh_expire_type attribute, the client FH4_VOL_MIGRATION is set in the fh_expire_type attribute, the client
must treat the volatile filehandle as if the server had returned the must treat the volatile filehandle as if the server had returned the
NFS4ERR_FHEXPIRED error. At the migration or replication event in NFS4ERR_FHEXPIRED error. At the migration or replication event in
the presence of the FH4_VOL_MIGRATION bit, the client will not the presence of the FH4_VOL_MIGRATION bit, the client will not
present the original or old volatile filehandle to the new server. present the original or old volatile filehandle to the new server.
The client will start its communication with the new server by The client will start its communication with the new server by
recovering its filehandles using the saved file names. recovering its filehandles using the saved file names.
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
7. NFS Server Name Space 7. NFS Server Name Space
7.1. Server Exports 7.1. Server Exports
On a UNIX server the name space describes all the files reachable by On a UNIX server the name space describes all the files reachable by
pathnames under the root directory or "/". On a Windows NT server pathnames under the root directory or "/". On a Windows NT server
the name space constitutes all the files on disks named by mapped the name space constitutes all the files on disks named by mapped
disk letters. NFS server administrators rarely make the entire disk letters. NFS server administrators rarely make the entire
server's filesystem name space available to NFS clients. More often server's filesystem name space available to NFS clients. More often
skipping to change at page 61, line 5 skipping to change at page 60, line 5
the server's name space on the client: it is static. If the server the server's name space on the client: it is static. If the server
administrator adds a new export the client will be unaware of it. administrator adds a new export the client will be unaware of it.
7.3. Server Pseudo Filesystem 7.3. Server Pseudo Filesystem
NFS version 4 servers avoid this name space inconsistency by NFS version 4 servers avoid this name space inconsistency by
presenting all the exports within the framework of a single server presenting all the exports within the framework of a single server
name space. An NFS version 4 client uses LOOKUP and READDIR name space. An NFS version 4 client uses LOOKUP and READDIR
operations to browse seamlessly from one export to another. Portions operations to browse seamlessly from one export to another. Portions
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
of the server name space that are not exported are bridged via a of the server name space that are not exported are bridged via a
"pseudo filesystem" that provides a view of exported directories "pseudo filesystem" that provides a view of exported directories
only. A pseudo filesystem has a unique fsid and behaves like a only. A pseudo filesystem has a unique fsid and behaves like a
normal, read only filesystem. normal, read only filesystem.
Based on the construction of the server's name space, it is possible Based on the construction of the server's name space, it is possible
that multiple pseudo filesystems may exist. For example, that multiple pseudo filesystems may exist. For example,
/a pseudo filesystem /a pseudo filesystem
/a/b real filesystem /a/b real filesystem
/a/b/c pseudo filesystem /a/b/c pseudo filesystem
/a/b/c/d real filesystem /a/b/c/d real filesystem
Each of the pseudo filesystems are considered separate entities and Each of the pseudo filesystems are considered separate entities and
therefore will have a unique fsid. therefore will have a unique fsid.
7.4. Multiple Roots 7.4. Multiple Roots
The DOS and Windows operating environments are sometimes described as The DOS and Windows operating environments are sometimes described as
having "multiple roots". filesystems are commonly represented as having "multiple roots". Filesystems are commonly represented as
disk letters. MacOS represents filesystems as top level names. NFS disk letters. MacOS represents filesystems as top level names. NFS
version 4 servers for these platforms can construct a pseudo file version 4 servers for these platforms can construct a pseudo file
system above these root names so that disk letters or volume names system above these root names so that disk letters or volume names
are simply directory names in the pseudo root. are simply directory names in the pseudo root.
7.5. Filehandle Volatility 7.5. Filehandle Volatility
The nature of the server's pseudo filesystem is that it is a logical The nature of the server's pseudo filesystem is that it is a logical
representation of filesystem(s) available from the server. representation of filesystem(s) available from the server.
Therefore, the pseudo filesystem is most likely constructed Therefore, the pseudo filesystem is most likely constructed
skipping to change at page 62, line 5 skipping to change at page 61, line 5
7.6. Exported Root 7.6. Exported Root
If the server's root filesystem is exported, one might conclude that If the server's root filesystem is exported, one might conclude that
a pseudo-filesystem is not needed. This would be wrong. Assume the a pseudo-filesystem is not needed. This would be wrong. Assume the
following filesystems on a server: following filesystems on a server:
/ disk1 (exported) / disk1 (exported)
/a disk2 (not exported) /a disk2 (not exported)
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
/a/b disk3 (exported) /a/b disk3 (exported)
Because disk2 is not exported, disk3 cannot be reached with simple Because disk2 is not exported, disk3 cannot be reached with simple
LOOKUPs. The server must bridge the gap with a pseudo-filesystem. LOOKUPs. The server must bridge the gap with a pseudo-filesystem.
7.7. Mount Point Crossing 7.7. Mount Point Crossing
The server filesystem environment may be constructed in such a way The server filesystem environment may be constructed in such a way
that one filesystem contains a directory which is 'covered' or that one filesystem contains a directory which is 'covered' or
skipping to change at page 62, line 53 skipping to change at page 61, line 53
server's perception of the client's ability to authenticate itself server's perception of the client's ability to authenticate itself
properly. However, with the support of multiple security mechanisms properly. However, with the support of multiple security mechanisms
and the ability to negotiate the appropriate use of these mechanisms, and the ability to negotiate the appropriate use of these mechanisms,
the server is unable to properly determine if a client will be able the server is unable to properly determine if a client will be able
to authenticate itself. If, based on its policies, the server to authenticate itself. If, based on its policies, the server
chooses to limit the contents of the pseudo filesystem, the server chooses to limit the contents of the pseudo filesystem, the server
may effectively hide filesystems from a client that may otherwise may effectively hide filesystems from a client that may otherwise
have legitimate access. have legitimate access.
As suggested practice, the server should apply the security policy of As suggested practice, the server should apply the security policy of
a shared resource in the server's namespace to the ancestors a shared resource in the server's namespace to the components of the
components of the namespace. For example: resource's ancestors. For example:
/ /
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
/a/b /a/b
/a/b/c /a/b/c
The /a/b/c directory is a real filesystem and is the shared resource. The /a/b/c directory is a real filesystem and is the shared resource.
The security policy for /a/b/c is Kerberos with integrity. The The security policy for /a/b/c is Kerberos with integrity. The
server should should apply the same security policy to /, /a, and server should apply the same security policy to /, /a, and /a/b.
/a/b. This allows for the extension of the protection of the This allows for the extension of the protection of the server's
server's namespace to the ancestors of the real shared resource. namespace to the ancestors of the real shared resource.
For the case of the use of multiple, disjoint security mechanisms in For the case of the use of multiple, disjoint security mechanisms in
the server's resources, the security for a particular object in the the server's resources, the security for a particular object in the
server's namespace should be the union of all security mechanisms of server's namespace should be the union of all security mechanisms of
all direct descendants. all direct descendants.
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
8. File Locking and Share Reservations 8. File Locking and Share Reservations
Integrating locking into the NFS protocol necessarily causes it to be Integrating locking into the NFS protocol necessarily causes it to be
stateful. With the inclusion of share reservations the protocol stateful. With the inclusion of share reservations the protocol
becomes substantially more dependent on state than the traditional becomes substantially more dependent on state than the traditional
combination of NFS and NLM [XNFS]. There are three components to combination of NFS and NLM [XNFS]. There are three components to
making this state manageable: making this state manageable:
o Clear division between client and server o Clear division between client and server
skipping to change at page 65, line 5 skipping to change at page 64, line 5
owner. owner.
The following sections describe the transition from the heavy weight The following sections describe the transition from the heavy weight
information to the eventual stateid used for most client and server information to the eventual stateid used for most client and server
locking and lease interactions. locking and lease interactions.
8.1.1. Client ID 8.1.1. Client ID
For each LOCK request, the client must identify itself to the server. For each LOCK request, the client must identify itself to the server.
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
This is done in such a way as to allow for correct lock This is done in such a way as to allow for correct lock
identification and crash recovery. A sequence of a SETCLIENTID identification and crash recovery. A sequence of a SETCLIENTID
operation followed by a SETCLIENTID_CONFIRM operation is required to operation followed by a SETCLIENTID_CONFIRM operation is required to
establish the identification onto the server. Establishment of establish the identification onto the server. Establishment of
identification by a new incarnation of the client also has the effect identification by a new incarnation of the client also has the effect
of immediately breaking any leased state that a previous incarnation of immediately breaking any leased state that a previous incarnation
of the client might have had on the server, as opposed to forcing the of the client might have had on the server, as opposed to forcing the
new client incarnation to wait for the leases to expire. Breaking new client incarnation to wait for the leases to expire. Breaking
the lease state amounts to the server removing all lock, share the lease state amounts to the server removing all lock, share
skipping to change at page 65, line 32 skipping to change at page 64, line 32
struct nfs_client_id4 { struct nfs_client_id4 {
verifier4 verifier; verifier4 verifier;
opaque id<NFS4_OPAQUE_LIMIT>; opaque id<NFS4_OPAQUE_LIMIT>;
}; };
The first field, verifier is a client incarnation verifier that is The first field, verifier is a client incarnation verifier that is
used to detect client reboots. Only if the verifier is different from used to detect client reboots. Only if the verifier is different from
that the server has previously recorded the client (as identified by that the server has previously recorded the client (as identified by
the second field f the structure, id) does the server start the the second field f the structure, id) does the server start the
process of cancelling the client's leased state. process of canceling the client's leased state.
The second field, id is a variable length string that uniquely The second field, id is a variable length string that uniquely
defines the client. defines the client.
There are several considerations for how the client generates the id There are several considerations for how the client generates the id
string: string:
o The string should be unique so that multiple clients do not o The string should be unique so that multiple clients do not
present the same string. The consequences of two clients present the same string. The consequences of two clients
presenting the same string range from one client getting an presenting the same string range from one client getting an
error to one client having its leased state abruptly and error to one client having its leased state abruptly and
unexpectedly cancelled. unexpectedly canceled.
o The string should be selected so the subsequent incarnations o The string should be selected so the subsequent incarnations
(e.g. reboots) of the same client cause the client to present (e.g. reboots) of the same client cause the client to present
the same string. The implementor is cautioned from an approach the same string. The implementor is cautioned from an approach
that requires the string to be recorded in a local file because that requires the string to be recorded in a local file because
this precludes the use of the implementation in an environment this precludes the use of the implementation in an environment
where there is no local disk and all file access is from an NFS where there is no local disk and all file access is from an NFS
version 4 server. version 4 server.
o The string should be different for each server network address o The string should be different for each server network address
that the client accesses, rather than common to all server that the client accesses, rather than common to all server
network addresses. The reason is that it may not be possible for network addresses. The reason is that it may not be possible for
the client to tell if same server is listening on multiple the client to tell if same server is listening on multiple
network addresses. If the client issues SETCLIENTID with the network addresses. If the client issues SETCLIENTID with the
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
same id string to each network address of such a server, the same id string to each network address of such a server, the
server will think it is the same client, and each successive server will think it is the same client, and each successive
SETCLIENTID will cause the server to begin the process of SETCLIENTID will cause the server to begin the process of
removing the client's previous leased state. removing the client's previous leased state.
o The algorithm for generating the string should not assume that o The algorithm for generating the string should not assume that
the client's network address won't change. This includes the client's network address won't change. This includes
changes between client incarnations and even changes while the changes between client incarnations and even changes while the
client is stilling running in its current incarnation. This client is stilling running in its current incarnation. This
means that if the client includes just the client's and server's means that if the client includes just the client's and server's
network address in the id string, there is a real risk, after network address in the id string, there is a real risk, after
the client gives up the network address, that another client, the client gives up the network address, that another client,
using a similar algorithm for generate the id string, will using a similar algorithm for generating the id string, will
generating a conflicting id string. generate a conflicting id string.
Given the above considerations, an example of a well generated id Given the above considerations, an example of a well generated id
string is one that includes: string is one that includes:
o The server's network address. o The server's network address.
o The client's network address. o The client's network address.
o For a user level NFS version 4 client, it should contain o For a user level NFS version 4 client, it should contain
additional information to distinguish the client from other user additional information to distinguish the client from other user
level clients running on the same host, such as a process id or level clients running on the same host, such as a process id or
other unique sequence. other unique sequence.
o Additional information that tends to be unique, such as one or o Additional information that tends to be unique, such as one or
more of: more of:
- The client machines serial number (for privacy reasons, it is - The client machine's serial number (for privacy reasons, it is
best to perform some one way function on the serial number). best to perform some one way function on the serial number).
- A MAC address. - A MAC address.
- The timestamp of when the NFS version 4 software was first - The timestamp of when the NFS version 4 software was first
installed on the client (though this is subject to the installed on the client (though this is subject to the
previously mentioned caution about using information that is previously mentioned caution about using information that is
stored in a file, because the file might only be accessible stored in a file, because the file might only be accessible
over NFS version 4). over NFS version 4).
skipping to change at page 67, line 5 skipping to change at page 66, line 5
the same between client incarnations, this shares the same the same between client incarnations, this shares the same
problem as that of the using the timestamp of the software problem as that of the using the timestamp of the software
installation. installation.
As a security measure, the server MUST NOT cancel a client's leased As a security measure, the server MUST NOT cancel a client's leased
state if the principal established the state for a given id string is state if the principal established the state for a given id string is
not the same as the principal issuing the SETCLIENTID. not the same as the principal issuing the SETCLIENTID.
Note that SETCLIENTID and SETCLIENTID_CONFIRM has a secondary purpose Note that SETCLIENTID and SETCLIENTID_CONFIRM has a secondary purpose
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
of establishing the information the server needs to make callbacks to of establishing the information the server needs to make callbacks to
the client for purpose of supporting delegations. It is permitted to the client for purpose of supporting delegations. It is permitted to
change this information via SETCLIENTID and SETCLIENTID_CONFIRM change this information via SETCLIENTID and SETCLIENTID_CONFIRM
within the same incarnation of the client without removing the within the same incarnation of the client without removing the
client's leased state. client's leased state.
Once a SETCLIENTID and SETCLIENTID_CONFIRM sequence has successfully Once a SETCLIENTID and SETCLIENTID_CONFIRM sequence has successfully
completed, the client uses the short hand client identifier, of type completed, the client uses the short hand client identifier, of type
clientid4, instead of the longer and less compact nfs_client_id4 clientid4, instead of the longer and less compact nfs_client_id4
structure. This short hand client identfier (a clientid) is assigned structure. This short hand client identifier (a clientid) is
by the server and should be chosen so that it will not conflict with assigned by the server and should be chosen so that it will not
a clientid previously assigned by the server. This applies across conflict with a clientid previously assigned by the server. This
server restarts or reboots. When a clientid is presented to a server applies across server restarts or reboots. When a clientid is
and that clientid is not recognized, as would happen after a server presented to a server and that clientid is not recognized, as would
reboot, the server will reject the request with the error happen after a server reboot, the server will reject the request with
NFS4ERR_STALE_CLIENTID. When this happens, the client must obtain a the error NFS4ERR_STALE_CLIENTID. When this happens, the client must
new clientid by use of the SETCLIENTID operation and then proceed to obtain a new clientid by use of the SETCLIENTID operation and then
any other necessary recovery for the server reboot case (See the proceed to any other necessary recovery for the server reboot case
section "Server Failure and Recovery"). (See the section "Server Failure and Recovery").
The client must also employ the SETCLIENTID operation when it The client must also employ the SETCLIENTID operation when it
receives a NFS4ERR_STALE_STATEID error using a stateid derived from receives a NFS4ERR_STALE_STATEID error using a stateid derived from
its current clientid, since this also indicates a server reboot which its current clientid, since this also indicates a server reboot which
has invalidated the existing clientid (see the next section has invalidated the existing clientid (see the next section
"lock_owner and stateid Definition" for details). "lock_owner and stateid Definition" for details).
See the detailed descriptions of SETCLIENTID and SETCLIENTID_CONFIRM See the detailed descriptions of SETCLIENTID and SETCLIENTID_CONFIRM
for a complete specification of the operations. for a complete specification of the operations.
skipping to change at page 68, line 5 skipping to change at page 67, line 5
there had been no activity from that client for many minutes. there had been no activity from that client for many minutes.
Note that if the id string in a SETCLIENTID request is properly Note that if the id string in a SETCLIENTID request is properly
constructed, and if the client takes care to use the same principal constructed, and if the client takes care to use the same principal
for each successive use of SETCLIENTID, then, barring an active for each successive use of SETCLIENTID, then, barring an active
denial of service attack, NFS4ERR_CLID_INUSE should never be denial of service attack, NFS4ERR_CLID_INUSE should never be
returned. returned.
However, client bugs, server bugs, or perhaps a deliberate change of However, client bugs, server bugs, or perhaps a deliberate change of
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
the principal owner of the id string (such as the case of a client the principal owner of the id string (such as the case of a client
that changes security flavors, and under the new flavor, there is no that changes security flavors, and under the new flavor, there is no
mapping to the previous owner) will in rare cases result in mapping to the previous owner) will in rare cases result in
NFS4ERR_CLID_INUSE. NFS4ERR_CLID_INUSE.
In that event, when the server gets a SETCLIENTID for a client id In that event, when the server gets a SETCLIENTID for a client id
that currently has no state, or it has state, but the lease has that currently has no state, or it has state, but the lease has
expired, rather than returning NFS4ERR_CLID_INUSE, the server MUST expired, rather than returning NFS4ERR_CLID_INUSE, the server MUST
allow the SETCLIENTID, and confirm the new clientid if followed by allow the SETCLIENTID, and confirm the new clientid if followed by
skipping to change at page 69, line 5 skipping to change at page 68, line 5
o The stateid was generated by an earlier server instance (i.e. o The stateid was generated by an earlier server instance (i.e.
before a server reboot). The error NFS4ERR_STALE_STATEID should before a server reboot). The error NFS4ERR_STALE_STATEID should
be returned. be returned.
o The stateid was generated by the current server instance but the o The stateid was generated by the current server instance but the
stateid no longer designates the current locking state for the stateid no longer designates the current locking state for the
lockowner-file pair in question (i.e. one or more locking lockowner-file pair in question (i.e. one or more locking
operations has occurred). The error NFS4ERR_OLD_STATEID should operations has occurred). The error NFS4ERR_OLD_STATEID should
be returned. be returned.
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
This error condition will only occur when the client issues a This error condition will only occur when the client issues a
locking request which changes a stateid while an I/O request locking request which changes a stateid while an I/O request
that uses that stateid is outstanding. that uses that stateid is outstanding.
o The stateid was generated by the current server instance but the o The stateid was generated by the current server instance but the
stateid does not designate a locking state for any active stateid does not designate a locking state for any active
lockowner-file pair. The error NFS4ERR_BAD_STATEID should be lockowner-file pair. The error NFS4ERR_BAD_STATEID should be
returned. returned.
This error condition will occur when there has been a logic This error condition will occur when there has been a logic
error on the part of the client or server. This should not error on the part of the client or server. This should not
happen. happen.
One mechanism that may be used to satisfy these requirements is for One mechanism that may be used to satisfy these requirements is for
the server to, the server to,
o divide the "other" field of each stateid into two fields: o divide the "other" field of each stateid into two fields:
- A server verifier which uniquely designates a particular - A server verifier which uniquely designates a particular
server server instantiation.
instantiation.
- An index into a table of locking-state structures. - An index into a table of locking-state structures.
o utilize the "seqid" field of each stateid, such that seqid is o utilize the "seqid" field of each stateid, such that seqid is
monotonically incremented for each stateid that is associated monotonically incremented for each stateid that is associated
with the same index into the locking-state table. with the same index into the locking-state table.
By matching the incoming stateid and its field values with the state By matching the incoming stateid and its field values with the state
held at the server, the server is able to easily determine if a held at the server, the server is able to easily determine if a
stateid is valid for its current instantiation and state. If the stateid is valid for its current instantiation and state. If the
skipping to change at page 69, line 56 skipping to change at page 69, line 4
between the old and new size (i.e. the range truncated or added to between the old and new size (i.e. the range truncated or added to
the file by means of the SETATTR), even where SETATTR is not the file by means of the SETATTR), even where SETATTR is not
explicitly mentioned in the text. explicitly mentioned in the text.
If the lock_owner performs a READ or WRITE in a situation in which it If the lock_owner performs a READ or WRITE in a situation in which it
has established a lock or share reservation on the server (any OPEN has established a lock or share reservation on the server (any OPEN
constitutes a share reservation) the stateid (previously returned by constitutes a share reservation) the stateid (previously returned by
the server) must be used to indicate what locks, including both the server) must be used to indicate what locks, including both
record locks and share reservations, are held by the lockowner. If record locks and share reservations, are held by the lockowner. If
no state is established by the client, either record lock or share no state is established by the client, either record lock or share
reservation, a stateid of all bits 0 is used. Regardless whether a
stateid of all bits 0, or a stateid returned by the server is used,
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
reservation, a stateid of all bits 0 is used. Regardless whether a
stateid of all bits 0, or a stateid returned by the server is used,
if there is a conflicting share reservation or mandatory record lock if there is a conflicting share reservation or mandatory record lock
held on the file, the server MUST refuse to service the READ or WRITE held on the file, the server MUST refuse to service the READ or WRITE
operation. operation.
Share reservations are established by OPEN operations and by their Share reservations are established by OPEN operations and by their
nature are mandatory in that when the OPEN denies READ or WRITE nature are mandatory in that when the OPEN denies READ or WRITE
operations, that denial results in such operations being rejected operations, that denial results in such operations being rejected
with error NFS4ERR_LOCKED. Record locks may be implemented by the with error NFS4ERR_LOCKED. Record locks may be implemented by the
server as either mandatory or advisory, or the choice of mandatory or server as either mandatory or advisory, or the choice of mandatory or
advisory behavior may be determined by the server on the basis of the advisory behavior may be determined by the server on the basis of the
file being accessed (for example, some UNIX-based servers support a file being accessed (for example, some UNIX-based servers support a
"mandatory lock bit" on the mode attribute such that if set, record "mandatory lock bit" on the mode attribute such that if set, record
locks are required on the file before I/O is possible). When record locks are required on the file before I/O is possible). When record
locks are advisory, they only prevent the granting of conflicting locks are advisory, they only prevent the granting of conflicting
lock requests and have no effect on READ's or WRITE's. Mandatory lock requests and have no effect on READs or WRITEs. Mandatory
record locks, however, prevent conflicting I/O operations. When they record locks, however, prevent conflicting I/O operations. When they
are attempted, they are rejected with NFS4ERR_LOCKED. Assuming an are attempted, they are rejected with NFS4ERR_LOCKED. When the
operating environment like UNIX that requires it, when the client client gets NFS4ERR_LOCKED on a file it knows it has the proper share
gets NFS4ERR_LOCKED on a file it knows it has the proper share
reservation for, it will need to issue a LOCK request on the region reservation for, it will need to issue a LOCK request on the region
of the file that includes the region the I/O was to be performed on, of the file that includes the region the I/O was to be performed on,
with an appropriate locktype (i.e. READ*_LT for a READ operation, with an appropriate locktype (i.e. READ*_LT for a READ operation,
WRITE*_LT for a WRITE operation). WRITE*_LT for a WRITE operation).
With NFS version 3, there was no notion of a stateid so there was no With NFS version 3, there was no notion of a stateid so there was no
way to tell if the application process of the client sending the READ way to tell if the application process of the client sending the READ
or WRITE operation had also acquired the appropriate record lock on or WRITE operation had also acquired the appropriate record lock on
the file. Thus there was no way to implement mandatory locking. With the file. Thus there was no way to implement mandatory locking. With
the stateid construct, this barrier has been removed. the stateid construct, this barrier has been removed.
skipping to change at page 70, line 58 skipping to change at page 70, line 5
NFS4ERR_LOCKED. NFS4ERR_LOCKED.
For Windows environments, there are no advisory record locks, so the For Windows environments, there are no advisory record locks, so the
server always checks for record locks during I/O requests. server always checks for record locks during I/O requests.
Thus, the NFS version 4 LOCK operation does not need to distinguish Thus, the NFS version 4 LOCK operation does not need to distinguish
between advisory and mandatory record locks. It is the NFS version 4 between advisory and mandatory record locks. It is the NFS version 4
server's processing of the READ and WRITE operations that introduces server's processing of the READ and WRITE operations that introduces
the distinction. the distinction.
Every stateid other than the special stateid values noted in this Draft Specification NFS version 4 Protocol September 2002
Draft Specification NFS version 4 Protocol August 2002
Every stateid other than the special stateid values noted in this
section, whether returned by an OPEN-type operation (i.e. OPEN, section, whether returned by an OPEN-type operation (i.e. OPEN,
OPEN_DOWNGRADE), or by a LOCK-type operation (i.e. LOCK or LOCKU), OPEN_DOWNGRADE), or by a LOCK-type operation (i.e. LOCK or LOCKU),
defines an access mode for the file (i.e. READ, WRITE, or READ-WRITE) defines an access mode for the file (i.e. READ, WRITE, or READ-WRITE)
as established by the original OPEN which began the stateid sequence, as established by the original OPEN which began the stateid sequence,
and as modified by subsequent OPEN's and OPEN_DOWNGRADE's within that and as modified by subsequent OPENs and OPEN_DOWNGRADEs within that
stateid sequence. When a READ, WRITE, or SETATTR which specifies the stateid sequence. When a READ, WRITE, or SETATTR which specifies the
size attribute, is done, the operation is subject to checking against size attribute, is done, the operation is subject to checking against
the access mode to verify that the operation is appropriate given the the access mode to verify that the operation is appropriate given the
OPEN with which the operation is associated. OPEN with which the operation is associated.
In the case of WRITE-type operations (i.e. WRITE's and SETATTR's In the case of WRITE-type operations (i.e. WRITEs and SETATTRs which
which set size), the server must verify that the access mode allows set size), the server must verify that the access mode allows writing
writing and return an NFS4ERR_OPENMODE error if it does not. In the and return an NFS4ERR_OPENMODE error if it does not. In the case, of
case, of READ, the server may perform the corresponding check on the READ, the server may perform the corresponding check on the access
access mode, or it may choose to allow READ on opens for WRITE only, mode, or it may choose to allow READ on opens for WRITE only, to
to accommodate clients whose write implementation may unavoidably do accommodate clients whose write implementation may unavoidably do
reads (e.g. due to buffer cache constraints). However, even if reads (e.g. due to buffer cache constraints). However, even if READs
READ's are allowed in these circumstances, the server MUST still are allowed in these circumstances, the server MUST still check for
check for locks that conflict with the READ (e.g. another open locks that conflict with the READ (e.g. another open specify denial
specify denial of READ's). Note that a server which does enforce the of READs). Note that a server which does enforce the access mode
access mode check on READ's need not explicitly check for conflicting check on READs need not explicitly check for conflicting share
share reservations since the existence of OPEN for read access reservations since the existence of OPEN for read access guarantees
guarantees that no conflicting share reservation can exist. that no conflicting share reservation can exist.
A stateid of all bits 1 (one) MAY allow READ operations to bypass A stateid of all bits 1 (one) MAY allow READ operations to bypass
locking checks at the server. However, WRITE operations with a locking checks at the server. However, WRITE operations with a
stateid with bits all 1 (one) MUST NOT bypass locking checks and are stateid with bits all 1 (one) MUST NOT bypass locking checks and are
treated exactly the same as if a stateid of all bits 0 were used. treated exactly the same as if a stateid of all bits 0 were used.
A lock may not be granted while a READ or WRITE operation using one A lock may not be granted while a READ or WRITE operation using one
of the special stateids is being performed and the range of the lock of the special stateids is being performed and the range of the lock
request conflicts with the range of the READ or WRITE operation. For request conflicts with the range of the READ or WRITE operation. For
the purposes of this paragraph, a conflict occurs when a shared lock the purposes of this paragraph, a conflict occurs when a shared lock
skipping to change at page 71, line 57 skipping to change at page 71, line 4
Locking is different than most NFS operations as it requires "at- Locking is different than most NFS operations as it requires "at-
most-one" semantics that are not provided by ONCRPC. ONCRPC over a most-one" semantics that are not provided by ONCRPC. ONCRPC over a
reliable transport is not sufficient because a sequence of locking reliable transport is not sufficient because a sequence of locking
requests may span multiple TCP connections. In the face of requests may span multiple TCP connections. In the face of
retransmission or reordering, lock or unlock requests must have a retransmission or reordering, lock or unlock requests must have a
well defined and consistent behavior. To accomplish this, each lock well defined and consistent behavior. To accomplish this, each lock
request contains a sequence number that is a consecutively increasing request contains a sequence number that is a consecutively increasing
integer. Different lock_owners have different sequences. The server integer. Different lock_owners have different sequences. The server
maintains the last sequence number (L) received and the response that maintains the last sequence number (L) received and the response that
was returned. The first request issued for any given lock_owner is was returned. The first request issued for any given lock_owner is
issued with a sequence number of zero.
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
issued with a sequence number of zero.
Note that for requests that contain a sequence number, for each Note that for requests that contain a sequence number, for each
lock_owner, there should be no more than one outstanding request. lock_owner, there should be no more than one outstanding request.
If a request (r) with a previous sequence number (r < L) is received, If a request (r) with a previous sequence number (r < L) is received,
it is rejected with the return of error NFS4ERR_BAD_SEQID. Given a it is rejected with the return of error NFS4ERR_BAD_SEQID. Given a
properly-functioning client, the response to (r) must have been properly-functioning client, the response to (r) must have been
received before the last request (L) was sent. If a duplicate of received before the last request (L) was sent. If a duplicate of
last request (r == L) is received, the stored response is returned. last request (r == L) is received, the stored response is returned.
If a request beyond the next sequence (r == L + 2) is received, it is If a request beyond the next sequence (r == L + 2) is received, it is
skipping to change at page 72, line 38 skipping to change at page 71, line 40
algorithm for removing unneeded requests. However, the last lock algorithm for removing unneeded requests. However, the last lock
request and response on a given lock_owner must be cached as long as request and response on a given lock_owner must be cached as long as
the lock state exists on the server. the lock state exists on the server.
The client MUST monotonically increment the sequence number for the The client MUST monotonically increment the sequence number for the
CLOSE, LOCK, LOCKU, OPEN, OPEN_CONFIRM, and OPEN_DOWNGRADE CLOSE, LOCK, LOCKU, OPEN, OPEN_CONFIRM, and OPEN_DOWNGRADE
operations. This is true even in the event that the previous operations. This is true even in the event that the previous
operation that used the sequence number received an error. The only operation that used the sequence number received an error. The only
exception to this rule is if the previous operation received one of exception to this rule is if the previous operation received one of
the following errors: NFS4ERR_STALE_CLIENTID, NFS4ERR_STALE_STATEID, the following errors: NFS4ERR_STALE_CLIENTID, NFS4ERR_STALE_STATEID,
NFS4ERR_BAD_STATEID, NFS4ERR_BAD_SEQID. NFS4ERR_BAD_STATEID, NFS4ERR_BAD_SEQID, NFS4ERR_BADXDR,
NFS4ERR_RESOURCE, NFS4ERR_NOFILEHANDLE.
8.1.6. Recovery from Replayed Requests 8.1.6. Recovery from Replayed Requests
As described above, the sequence number is per lock_owner. As long As described above, the sequence number is per lock_owner. As long
as the server maintains the last sequence number received and follows as the server maintains the last sequence number received and follows
the methods described above, there are no risks of a Byzantine router the methods described above, there are no risks of a Byzantine router
re-sending old requests. The server need only maintain the re-sending old requests. The server need only maintain the
(lock_owner, sequence number) state as long as there are open files (lock_owner, sequence number) state as long as there are open files
or closed files with locks outstanding. or closed files with locks outstanding.
LOCK, LOCKU, OPEN, OPEN_DOWNGRADE, and CLOSE each contain a sequence LOCK, LOCKU, OPEN, OPEN_DOWNGRADE, and CLOSE each contain a sequence
number and therefore the risk of the replay of these operations number and therefore the risk of the replay of these operations
resulting in undesired effects is non-existent while the server resulting in undesired effects is non-existent while the server
maintains the lock_owner state. maintains the lock_owner state.
Draft Specification NFS version 4 Protocol September 2002
8.1.7. Releasing lock_owner State 8.1.7. Releasing lock_owner State
When a particular lock_owner no longer holds open or file locking When a particular lock_owner no longer holds open or file locking
Draft Specification NFS version 4 Protocol August 2002
state at the server, the server may choose to release the sequence state at the server, the server may choose to release the sequence
number state associated with the lock_owner. The server may make number state associated with the lock_owner. The server may make
this choice based on lease expiration, for the reclamation of server this choice based on lease expiration, for the reclamation of server
memory, or other implementation specific details. In any event, the memory, or other implementation specific details. In any event, the
server is able to do this safely only when the lock_owner no longer server is able to do this safely only when the lock_owner no longer
is being utilized by the client. The server may choose to hold the is being utilized by the client. The server may choose to hold the
lock_owner state in the event that retransmitted requests are lock_owner state in the event that retransmitted requests are
received. However, the period to hold this state is implementation received. However, the period to hold this state is implementation
specific. specific.
skipping to change at page 73, line 54 skipping to change at page 73, line 5
situations in which the server can avoid the need for confirmation situations in which the server can avoid the need for confirmation
when responding to open requests. The two constraints are: when responding to open requests. The two constraints are:
o The server must not bestow a delegation for any open which would o The server must not bestow a delegation for any open which would
require confirmation. require confirmation.
o The server MUST NOT require confirmation on a reclaim-type open o The server MUST NOT require confirmation on a reclaim-type open
(i.e. one specifying claim type CLAIM_PREVIOUS or (i.e. one specifying claim type CLAIM_PREVIOUS or
CLAIM_DELEGATE_PREV). CLAIM_DELEGATE_PREV).
These constraints are related in that reclaim-type opens are the Draft Specification NFS version 4 Protocol September 2002
only ones in which the server may be required to send a
delegation. For CLAIM_NULL, sending the delegation is optional
while for CLAIM_DELEGATE_CUR, no delegation is sent.
Draft Specification NFS version 4 Protocol August 2002 These constraints are related in that reclaim-type opens are the only
ones in which the server may be required to send a delegation. For
CLAIM_NULL, sending the delegation is optional while for
CLAIM_DELEGATE_CUR, no delegation is sent.
Delegations being sent with an open requiring confirmation are Delegations being sent with an open requiring confirmation are
troublesome because recovering from non-confirmation adds undue troublesome because recovering from non-confirmation adds undue
complexity to the protocol while requiring confirmation on complexity to the protocol while requiring confirmation on reclaim-
reclaim-type opens poses difficulties in that the inability to type opens poses difficulties in that the inability to resolve the
resolve the status of the reclaim until lease expiration may status of the reclaim until lease expiration may make it difficult to
make it difficult to have timely determination of the set of have timely determination of the set of locks being reclaimed (since
locks being reclaimed (since the grace period may expire). the grace period may expire).
Requiring open confirmation on reclaim-type opens is avoidable Requiring open confirmation on reclaim-type opens is avoidable
because of the nature of the environments in which such opens because of the nature of the environments in which such opens are
are done. For CLAIM_PREVIOUS opens, this is immediately after done. For CLAIM_PREVIOUS opens, this is immediately after server
server reboot, so there should be no time for lockowners to be reboot, so there should be no time for lockowners to be created,
created, found to be unused, and recycled. For found to be unused, and recycled. For CLAIM_DELEGATE_PREV opens, we
CLAIM_DELEGATE_PREV opens, we are dealing with a client reboot are dealing with a client reboot situation. A server which supports
situation. A server which supports delegation can be sure that delegation can be sure that no lockowners for that client have been
no lockowners for that client have been recycled since client recycled since client initialization and thus can ensure that
initialization and thus can ensure that confirmation will not be confirmation will not be required.
required.
8.2. Lock Ranges 8.2. Lock Ranges
The protocol allows a lock owner to request a lock with a byte range The protocol allows a lock owner to request a lock with a byte range
and then either upgrade or unlock a sub-range of the initial lock. and then either upgrade or unlock a sub-range of the initial lock.
It is expected that this will be an uncommon type of request. In any It is expected that this will be an uncommon type of request. In any
case, servers or server filesystems may not be able to support sub- case, servers or server filesystems may not be able to support sub-
range lock semantics. In the event that a server receives a locking range lock semantics. In the event that a server receives a locking
request that represents a sub-range of current locking state for the request that represents a sub-range of current locking state for the
lock owner, the server is allowed to return the error lock owner, the server is allowed to return the error
skipping to change at page 74, line 53 skipping to change at page 74, line 4
the recovery of file locking state in the event of server failure. the recovery of file locking state in the event of server failure.
As discussed in the section "Server Failure and Recovery" below, the As discussed in the section "Server Failure and Recovery" below, the
server may employ certain optimizations during recovery that work server may employ certain optimizations during recovery that work
effectively only when the client's behavior during lock recovery is effectively only when the client's behavior during lock recovery is
similar to the client's locking behavior prior to server failure. similar to the client's locking behavior prior to server failure.
8.3. Upgrading and Downgrading Locks 8.3. Upgrading and Downgrading Locks
If a client has a write lock on a record, it can request an atomic If a client has a write lock on a record, it can request an atomic
downgrade of the lock to a read lock via the LOCK request, by setting downgrade of the lock to a read lock via the LOCK request, by setting
Draft Specification NFS version 4 Protocol September 2002
the type to READ_LT. If the server supports atomic downgrade, the the type to READ_LT. If the server supports atomic downgrade, the
request will succeed. If not, it will return NFS4ERR_LOCK_NOTSUPP. request will succeed. If not, it will return NFS4ERR_LOCK_NOTSUPP.
The client should be prepared to receive this error, and if The client should be prepared to receive this error, and if
appropriate, report the error to the requesting application. appropriate, report the error to the requesting application.
Draft Specification NFS version 4 Protocol August 2002
If a client has a read lock on a record, it can request an atomic If a client has a read lock on a record, it can request an atomic
upgrade of the lock to a write lock via the LOCK request by setting upgrade of the lock to a write lock via the LOCK request by setting
the type to WRITE_LT or WRITEW_LT. If the server does not support the type to WRITE_LT or WRITEW_LT. If the server does not support
atomic upgrade, it will return NFS4ERR_LOCK_NOTSUPP. If the upgrade atomic upgrade, it will return NFS4ERR_LOCK_NOTSUPP. If the upgrade
can be achieved without an existing conflict, the request will can be achieved without an existing conflict, the request will
succeed. Otherwise, the server will return either NFS4ERR_DENIED or succeed. Otherwise, the server will return either NFS4ERR_DENIED or
NFS4ERR_DEADLOCK. The error NFS4ERR_DEADLOCK is returned if the NFS4ERR_DEADLOCK. The error NFS4ERR_DEADLOCK is returned if the
client issued the LOCK request with the type set to WRITEW_LT and the client issued the LOCK request with the type set to WRITEW_LT and the
server has detected a deadlock. The client should be prepared to server has detected a deadlock. The client should be prepared to
receive such errors and if appropriate, report the error to the receive such errors and if appropriate, report the error to the
skipping to change at page 75, line 52 skipping to change at page 75, line 4
released, allowing a successful return. In this way, clients can released, allowing a successful return. In this way, clients can
avoid the burden of needlessly frequent polling for blocking locks. avoid the burden of needlessly frequent polling for blocking locks.
The server should take care in the length of delay in the event the The server should take care in the length of delay in the event the
client retransmits the request. client retransmits the request.
8.5. Lease Renewal 8.5. Lease Renewal
The purpose of a lease is to allow a server to remove stale locks The purpose of a lease is to allow a server to remove stale locks
that are held by a client that has crashed or is otherwise that are held by a client that has crashed or is otherwise
unreachable. It is not a mechanism for cache consistency and lease unreachable. It is not a mechanism for cache consistency and lease
Draft Specification NFS version 4 Protocol September 2002
renewals may not be denied if the lease interval has not expired. renewals may not be denied if the lease interval has not expired.
The following events cause implicit renewal of all of the leases for The following events cause implicit renewal of all of the leases for
a given client (i.e. all those sharing a given clientid). Each of a given client (i.e. all those sharing a given clientid). Each of
these is a positive indication that the client is still active and these is a positive indication that the client is still active and
Draft Specification NFS version 4 Protocol August 2002
that the associated state held at the server, for the client, is that the associated state held at the server, for the client, is
still valid. still valid.
o An OPEN with a valid clientid. o An OPEN with a valid clientid.
o Any operation made with a valid stateid (CLOSE, DELEGPURGE, o Any operation made with a valid stateid (CLOSE, DELEGPURGE,
DELEGRETURN, LOCK, LOCKU, OPEN, OPEN_CONFIRM, OPEN_DOWNGRADE, DELEGRETURN, LOCK, LOCKU, OPEN, OPEN_CONFIRM, OPEN_DOWNGRADE,
READ, RENEW, SETATTR, WRITE). This does not include the special READ, RENEW, SETATTR, WRITE). This does not include the special
stateids of all bits 0 or all bits 1. stateids of all bits 0 or all bits 1.
skipping to change at page 76, line 52 skipping to change at page 76, line 5
8.6. Crash Recovery 8.6. Crash Recovery
The important requirement in crash recovery is that both the client The important requirement in crash recovery is that both the client
and the server know when the other has failed. Additionally, it is and the server know when the other has failed. Additionally, it is
required that a client sees a consistent view of data across server required that a client sees a consistent view of data across server
restarts or reboots. All READ and WRITE operations that may have restarts or reboots. All READ and WRITE operations that may have
been queued within the client or network buffers must wait until the been queued within the client or network buffers must wait until the
client has successfully recovered the locks protecting the READ and client has successfully recovered the locks protecting the READ and
WRITE operations. WRITE operations.
Draft Specification NFS version 4 Protocol September 2002
8.6.1. Client Failure and Recovery 8.6.1. Client Failure and Recovery
In the event that a client fails, the server may recover the client's In the event that a client fails, the server may recover the client's
locks when the associated leases have expired. Conflicting locks locks when the associated leases have expired. Conflicting locks
from another client may only be granted after this lease expiration. from another client may only be granted after this lease expiration.
Draft Specification NFS version 4 Protocol August 2002
If the client is able to restart or reinitialize within the lease If the client is able to restart or reinitialize within the lease
period the client may be forced to wait the remainder of the lease period the client may be forced to wait the remainder of the lease
period before obtaining new locks. period before obtaining new locks.
To minimize client delay upon restart, lock requests are associated To minimize client delay upon restart, lock requests are associated
with an instance of the client by a client supplied verifier. This with an instance of the client by a client supplied verifier. This
verifier is part of the initial SETCLIENTID call made by the client. verifier is part of the initial SETCLIENTID call made by the client.
The server returns a clientid as a result of the SETCLIENTID The server returns a clientid as a result of the SETCLIENTID
operation. The client then confirms the use of the clientid with operation. The client then confirms the use of the clientid with
SETCLIENTID_CONFIRM. The clientid in combination with an opaque SETCLIENTID_CONFIRM. The clientid in combination with an opaque
skipping to change at page 77, line 53 skipping to change at page 77, line 4
A client can determine that server failure (and thus loss of locking A client can determine that server failure (and thus loss of locking
state) has occurred, when it receives one of two errors. The state) has occurred, when it receives one of two errors. The
NFS4ERR_STALE_STATEID error indicates a stateid invalidated by a NFS4ERR_STALE_STATEID error indicates a stateid invalidated by a
reboot or restart. The NFS4ERR_STALE_CLIENTID error indicates a reboot or restart. The NFS4ERR_STALE_CLIENTID error indicates a
clientid invalidated by reboot or restart. When either of these are clientid invalidated by reboot or restart. When either of these are
received, the client must establish a new clientid (See the section received, the client must establish a new clientid (See the section
"Client ID") and re-establish the locking state as discussed below. "Client ID") and re-establish the locking state as discussed below.
The period of special handling of locking and READs and WRITEs, equal The period of special handling of locking and READs and WRITEs, equal
Draft Specification NFS version 4 Protocol September 2002
in duration to the lease period, is referred to as the "grace in duration to the lease period, is referred to as the "grace
period". During the grace period, clients recover locks and the period". During the grace period, clients recover locks and the
associated state by reclaim-type locking requests (i.e. LOCK requests associated state by reclaim-type locking requests (i.e. LOCK requests
with reclaim set to true and OPEN operations with a claim type of with reclaim set to true and OPEN operations with a claim type of
CLAIM_PREVIOUS). During the grace period, the server must reject CLAIM_PREVIOUS). During the grace period, the server must reject
Draft Specification NFS version 4 Protocol August 2002
READ and WRITE operations and non-reclaim locking requests (i.e. READ and WRITE operations and non-reclaim locking requests (i.e.
other LOCK and OPEN operations) with an error of NFS4ERR_GRACE. other LOCK and OPEN operations) with an error of NFS4ERR_GRACE.
If the server can reliably determine that granting a non-reclaim If the server can reliably determine that granting a non-reclaim
request will not conflict with reclamation of locks by other clients, request will not conflict with reclamation of locks by other clients,
the NFS4ERR_GRACE error does not have to be returned and the non- the NFS4ERR_GRACE error does not have to be returned and the non-
reclaim client request can be serviced. For the server to be able to reclaim client request can be serviced. For the server to be able to
service READ and WRITE operations during the grace period, it must service READ and WRITE operations during the grace period, it must
again be able to guarantee that no possible conflict could arise again be able to guarantee that no possible conflict could arise
between an impending reclaim locking request and the READ or WRITE between an impending reclaim locking request and the READ or WRITE
skipping to change at page 78, line 54 skipping to change at page 78, line 4
Clients should be prepared for the return of NFS4ERR_GRACE errors for Clients should be prepared for the return of NFS4ERR_GRACE errors for
non-reclaim lock and I/O requests. In this case the client should non-reclaim lock and I/O requests. In this case the client should
employ a retry mechanism for the request. A delay (on the order of employ a retry mechanism for the request. A delay (on the order of
several seconds) between retries should be used to avoid overwhelming several seconds) between retries should be used to avoid overwhelming
the server. Further discussion of the general issue is included in the server. Further discussion of the general issue is included in
[Floyd]. The client must account for the server that is able to [Floyd]. The client must account for the server that is able to
perform I/O and non-reclaim locking requests within the grace period perform I/O and non-reclaim locking requests within the grace period
as well as those that can not do so. as well as those that can not do so.
A reclaim-type locking request outside the server's grace period can A reclaim-type locking request outside the server's grace period can
Draft Specification NFS version 4 Protocol September 2002
only succeed if the server can guarantee that no conflicting lock or only succeed if the server can guarantee that no conflicting lock or
I/O request has been granted since reboot or restart. I/O request has been granted since reboot or restart.
A server may, upon restart, establish a new value for the lease A server may, upon restart, establish a new value for the lease
period. Therefore, clients should, once a new clientid is period. Therefore, clients should, once a new clientid is
Draft Specification NFS version 4 Protocol August 2002
established, refetch the lease_time attribute and use it as the basis established, refetch the lease_time attribute and use it as the basis
for lease renewal for the lease associated with that server. However, for lease renewal for the lease associated with that server. However,
the server must establish, for this restart event, a grace period at the server must establish, for this restart event, a grace period at
least as long as the lease period for the previous server least as long as the lease period for the previous server
instantiation. This allows the client state obtained during the instantiation. This allows the client state obtained during the
previous server instance to be reliably re-established. previous server instance to be reliably re-established.
8.6.3. Network Partitions and Recovery 8.6.3. Network Partitions and Recovery
If the duration of a network partition is greater than the lease If the duration of a network partition is greater than the lease
skipping to change at page 79, line 33 skipping to change at page 78, line 38
returning the error NFS4ERR_EXPIRED. Once this error is received, returning the error NFS4ERR_EXPIRED. Once this error is received,
the client will suitably notify the application that held the lock. the client will suitably notify the application that held the lock.
As a courtesy to the client or as an optimization, the server may As a courtesy to the client or as an optimization, the server may
continue to hold locks on behalf of a client for which recent continue to hold locks on behalf of a client for which recent
communication has extended beyond the lease period. If the server communication has extended beyond the lease period. If the server
receives a lock or I/O request that conflicts with one of these receives a lock or I/O request that conflicts with one of these
courtesy locks, the server must free the courtesy lock and grant the courtesy locks, the server must free the courtesy lock and grant the
new request. new request.
If the server continues to hold locks beyond the expiration of a When a network partition is combined with a server reboot, there are
client's lease, the server MUST employ a method of recording this edge conditions that place requirements on the server in order to
fact in its stable storage. Conflicting lock requests from another avoid silent data corruption following the server reboot. Two of
client may be serviced after the lease expiration. There are various these edge conditions are known, and are discussed below.
scenarios involving server failure after such an event that require
the storage of these lease expirations or network partitions. One
scenario is as follows:
A client holds a lock at the server and encounters a The first edge condition has the following scenario:
network partition and is unable to renew the associated
lease. A second client obtains a conflicting lock and then
frees the lock. After the unlock request by the second
client, the server reboots or reinitializes. Once the
server recovers, the network partition heals and the
original client attempts to reclaim the original lock.
In this scenario and without any state information, the server will 1. Client A acquires a lock.
allow the reclaim and the client will be in an inconsistent state
because the server or the client has no knowledge of the conflicting
lock.
The server may choose to store this lease expiration or network 2. Client A and server experience mutual network partition,
partitioning state in a way that will only identify the client as a such that client A is unable to renew its lease.
whole. Note that this may potentially lead to lock reclaims being
Draft Specification NFS version 4 Protocol August 2002 3. Client A's lease expires, so server releases lock.
denied unnecessarily because of a mix of conflicting and non- 4. Client B acquires a lock that would have conflicted with
conflicting locks. The server may also choose to store information that of Client A.
about each lock that has an expired lease with an associated
conflicting lock. The choice of the amount and type of state 5. Client B releases the lock
information that is stored is left to the implementor. In any case,
the server must have enough state information to enable correct Draft Specification NFS version 4 Protocol September 2002
recovery from multiple partitions and multiple server failures.
6. Server reboots
7. Network partition between client A and server heals.
8. Client A issues a RENEW operation, and gets back a
NFS4ERR_STALE_CLIENTID.
9. Client A reclaims its lock within the server's grace period.
Thus, at the final step, the server has erroneously granted client
A's lock reclaim. If client B modified the object the lock was
protecting, client A will experience object corruption.
The second known edge condition follows:
1. Client A acquires a lock.
2. Server reboots.
3. Client A and server experience mutual network partition,
such that client A is unable to reclaim its lock within the
grace period.
4. Server's reclaim grace period ends. Client A has no locks
recorded on server.
5. Client B acquires a lock that would have conflicted with
that of Client A.
6. Client B releases the lock
7. Server reboots a second time
8. Network partition between client A and server heals.
9. Client A issues a RENEW operation, and gets back a
NFS4ERR_STALE_CLIENTID.
10. Client A reclaims its lock within the server's grace period.
As with the first edge condition, the final step of the scenario of
the second edge condition has the server erroneously granting client
A's lock reclaim.
Solving the first and second edge conditions requires that the server
either assume after it reboots that edge condition occurs, and thus
return NFS4ERR_NO_GRACE for all reclaim attempts, or that the server
record some information stable storage. The amount of information
the server records in stable storage is in inverse proportion to how
harsh the server wants to be whenever the edge conditions occur. The
server that is completely tolerant of all edge conditions will record
in stable storage every lock that is acquired, removing the lock
Draft Specification NFS version 4 Protocol September 2002
record from stable storage only when the lock is unlocked by the
client and the lock's lockowner advances the sequence number such
that the lock release is not the last stateful event for the
lockowner's sequence. For the two aforementioned edge conditions, the
harshest a server can be, and still support a grace period for
reclaims, requires that the server record in stable storage
information some minimal information. For example, a server
implementation could, for each client, save in stable storage a
record containing:
o the client's id string
o a boolean that indicates if the client's lease expired or if
there was administrative intervention (see the section,
Server Revocation of Locks) to revoke a record lock, share
reservation, or delegation
o a timestamp that is updated the first time after a server
boot or reboot the client acquires record locking, share
reservation, or delegation state on the server. The
timestamp need not be updated on subsequent lock requests
until the server reboots.
The server implementation would also record in the stable storage the
timestamps from the two most recent server reboots.
Assuming the above record keeping, for the first edge condition,
after the server reboots, the record that client A's lease expired
means that another client could have acquired a conflicting record
lock, share reservation, or delegation. Hence the server must reject
a reclaim from client A with the error NFS4ERR_NO_GRACE.
For the second edge condition, after the server reboots for a second
time, the record that the client had an unexpired record lock, share
reservation, or delegation established before the server's previous
incarnation means that the server must reject a reclaim from client A
with the error NFS4ERR_NO_GRACE.
Regardless of the level and approach to record keeping, the server
MUST implement one of the following strategies (which apply to
reclaims of share reservations, record locks, and delegations):
1. Reject all reclaims with NFS4ERR_NO_GRACE. This is
superharsh, but necessary if the server does not want to
record lock state in stable storage.
2. Record sufficient state in stable storage such that all
known edge conditions involving server reboot, including the
two noted in this section, are detected. False positives are
acceptable. Note that at this time, it is not known if there
are other edge conditions.
Draft Specification NFS version 4 Protocol September 2002
In the event, after a server reboot, the server determines
that there is unrecoverable damage or corruption to the the
stable storage, then for all clients and/or locks affected,
the server MUST return NFS4ERR_NO_GRACE.
A mandate for the client's handling of the NFS4ERR_NO_GRACE error is
outside the scope of this specification, since the strategies for
such handling are very dependent on the client's operating
environment. However, one potential approach is described below.
When the client receives NFS4ERR_NO_GRACE, it could examine the
change attribute of the objects the client is trying to reclaim state
for, and use that to determine whether to re-establish the state via
normal OPEN or LOCK requests. This is acceptable provided the
client's operating environment allows it. In otherwords, the client
implementor is advised to document for his users the behavior. The
client could also inform the application that its record lock or
share reservations (whether they were delegated or not) have been
lost, such as via a UNIX signal, a GUI pop-up window, etc. See the
section, "Data Caching and Revocation" for a discussion of what the
client should do for dealing with unreclaimed delegations on client
state.
For further discussion of revocation of locks see the section "Server For further discussion of revocation of locks see the section "Server
Revocation of Locks". Revocation of Locks".
8.7. Recovery from a Lock Request Timeout or Abort 8.7. Recovery from a Lock Request Timeout or Abort
In the event a lock request times out, a client may decide to not In the event a lock request times out, a client may decide to not
retry the request. The client may also abort the request when the retry the request. The client may also abort the request when the
process for which it was issued is terminated (e.g. in UNIX due to a process for which it was issued is terminated (e.g. in UNIX due to a
signal). It is possible though that the server received the request signal). It is possible though that the server received the request
skipping to change at page 80, line 44 skipping to change at page 82, line 5
not receive a response. From this, the next time the client does a not receive a response. From this, the next time the client does a
lock operation for the lock_owner, it can send the cached request, if lock operation for the lock_owner, it can send the cached request, if
there is one, and if the request was one that established state (e.g. there is one, and if the request was one that established state (e.g.
a LOCK or OPEN operation), the server will return the cached result a LOCK or OPEN operation), the server will return the cached result
or if never saw the request, perform it. The client can follow up or if never saw the request, perform it. The client can follow up
with a request to remove the state (e.g. a LOCKU or CLOSE operation). with a request to remove the state (e.g. a LOCKU or CLOSE operation).
With this approach, the sequencing and stateid information on the With this approach, the sequencing and stateid information on the
client and server for the given lock_owner will re-synchronize and in client and server for the given lock_owner will re-synchronize and in
turn the lock state will re-synchronize. turn the lock state will re-synchronize.
Draft Specification NFS version 4 Protocol September 2002
8.8. Server Revocation of Locks 8.8. Server Revocation of Locks
At any point, the server can revoke locks held by a client and the At any point, the server can revoke locks held by a client and the
client must be prepared for this event. When the client detects that client must be prepared for this event. When the client detects that
its locks have been or may have been revoked, the client is its locks have been or may have been revoked, the client is
responsible for validating the state information between itself and responsible for validating the state information between itself and
the server. Validating locking state for the client means that it the server. Validating locking state for the client means that it
must verify or reclaim state for each lock currently held. must verify or reclaim state for each lock currently held.
The first instance of lock revocation is upon server reboot or re- The first instance of lock revocation is upon server reboot or re-
initialization. In this instance the client will receive an error initialization. In this instance the client will receive an error
(NFS4ERR_STALE_STATEID or NFS4ERR_STALE_CLIENTID) and the client will (NFS4ERR_STALE_STATEID or NFS4ERR_STALE_CLIENTID) and the client will
proceed with normal crash recovery as described in the previous proceed with normal crash recovery as described in the previous
Draft Specification NFS version 4 Protocol August 2002
section. section.
The second lock revocation event is the inability to renew the lease The second lock revocation event is the inability to renew the lease
before expiration. While this is considered a rare or unusual event, before expiration. While this is considered a rare or unusual event,
the client must be prepared to recover. Both the server and client the client must be prepared to recover. Both the server and client
will be able to detect the failure to renew the lease and are capable will be able to detect the failure to renew the lease and are capable
of recovering without data corruption. For the server, it tracks the of recovering without data corruption. For the server, it tracks the
last renewal event serviced for the client and knows when the lease last renewal event serviced for the client and knows when the lease
will expire. Similarly, the client must track operations which will will expire. Similarly, the client must track operations which will
renew the lease period. Using the time that each such request was renew the lease period. Using the time that each such request was
sent and the time that the corresponding reply was received, the sent and the time that the corresponding reply was received, the
client should bound the time that the corresponding renewal could client should bound the time that the corresponding renewal could
have occurred on the server and thus determine if it is possible that have occurred on the server and thus determine if it is possible that
a lease period expiration could have occurred. a lease period expiration could have occurred.
The third lock revocation event can occur as a result of The third lock revocation event can occur as a result of
administrative intervention within the lease period. While this is administrative intervention within the lease period. While this is
considered a rare event, it is possible that the server's considered a rare event, it is possible that the server's
administrator has decided to release or revoke a particular lock held administrator has decided to release or revoke a particular lock held
by the client. As a result of revocation, the client will receive an by the client. As a result of revocation, the client will receive an
error of NFS4ERR_EXPIRED and the error is received within the lease error of NFS4ERR_ADMIN_REVOKED. In this instance the client may
period for the lock. In this instance the client may assume that assume that only the lock_owner's locks have been lost. The client
only the lock_owner's locks have been lost. The client notifies the notifies the lock holder appropriately. The client may not assume
lock holder appropriately. The client may not assume the lease the lease period has been renewed as a result of failed operation.
period has been renewed as a result of failed operation.
When the client determines the lease period may have expired, the When the client determines the lease period may have expired, the
client must mark all locks held for the associated lease as client must mark all locks held for the associated lease as
"unvalidated". This means the client has been unable to re-establish "unvalidated". This means the client has been unable to re-establish
or confirm the appropriate lock state with the server. As described or confirm the appropriate lock state with the server. As described
in the previous section on crash recovery, there are scenarios in in the previous section on crash recovery, there are scenarios in
which the server may grant conflicting locks after the lease period which the server may grant conflicting locks after the lease period
has expired for a client. When it is possible that the lease period has expired for a client. When it is possible that the lease period
has expired, the client must validate each lock currently held to has expired, the client must validate each lock currently held to
ensure that a conflicting lock has not been granted. The client may ensure that a conflicting lock has not been granted. The client may
accomplish this task by issuing an I/O request, either a pending I/O accomplish this task by issuing an I/O request, either a pending I/O
or a zero-length read, specifying the stateid associated with the or a zero-length read, specifying the stateid associated with the
lock in question. If the response to the request is success, the lock in question. If the response to the request is success, the
client has validated all of the locks governed by that stateid and client has validated all of the locks governed by that stateid and
re-established the appropriate state between itself and the server. re-established the appropriate state between itself and the server.
Draft Specification NFS version 4 Protocol September 2002
If the I/O request is not successful, then one or more of the locks If the I/O request is not successful, then one or more of the locks
associated with the stateid was revoked by the server and the client associated with the stateid was revoked by the server and the client
must notify the owner. must notify the owner.
8.9. Share Reservations 8.9. Share Reservations
A share reservation is a mechanism to control access to a file. It A share reservation is a mechanism to control access to a file. It
is a separate and independent mechanism from record locking. When a is a separate and independent mechanism from record locking. When a
client opens a file, it issues an OPEN operation to the server client opens a file, it issues an OPEN operation to the server
specifying the type of access required (READ, WRITE, or BOTH) and the specifying the type of access required (READ, WRITE, or BOTH) and the
type of access to deny others (deny NONE, READ, WRITE, or BOTH). If type of access to deny others (deny NONE, READ, WRITE, or BOTH). If
Draft Specification NFS version 4 Protocol August 2002
the OPEN fails the client will fail the application's open request. the OPEN fails the client will fail the application's open request.
Pseudo-code definition of the semantics: Pseudo-code definition of the semantics:
if (request.access == 0)
return (NFS4ERR_INVAL)
else
if ((request.access & file_state.deny)) || if ((request.access & file_state.deny)) ||
(request.deny & file_state.access)) (request.deny & file_state.access))
return (NFS4ERR_DENIED) return (NFS4ERR_DENIED)
This checking of share reservations on OPEN is done with no exception This checking of share reservations on OPEN is done with no exception
for an existing OPEN for the same open_owner. for an existing OPEN for the same open_owner.
The constants used for the OPEN and OPEN_DOWNGRADE operations for the The constants used for the OPEN and OPEN_DOWNGRADE operations for the
access and deny fields are as follows: access and deny fields are as follows:
skipping to change at page 82, line 41 skipping to change at page 84, line 5
To provide correct share semantics, a client MUST use the OPEN To provide correct share semantics, a client MUST use the OPEN
operation to obtain the initial filehandle and indicate the desired operation to obtain the initial filehandle and indicate the desired
access and what if any access to deny. Even if the client intends to access and what if any access to deny. Even if the client intends to
use a stateid of all 0's or all 1's, it must still obtain the use a stateid of all 0's or all 1's, it must still obtain the
filehandle for the regular file with the OPEN operation so the filehandle for the regular file with the OPEN operation so the
appropriate share semantics can be applied. For clients that do not appropriate share semantics can be applied. For clients that do not
have a deny mode built into their open programming interfaces, deny have a deny mode built into their open programming interfaces, deny
equal to NONE should be used. equal to NONE should be used.
Draft Specification NFS version 4 Protocol September 2002
The OPEN operation with the CREATE flag, also subsumes the CREATE The OPEN operation with the CREATE flag, also subsumes the CREATE
operation for regular files as used in previous versions of the NFS operation for regular files as used in previous versions of the NFS
protocol. This allows a create with a share to be done atomically. protocol. This allows a create with a share to be done atomically.
The CLOSE operation removes all share reservations held by the The CLOSE operation removes all share reservations held by the
lock_owner on that file. If record locks are held, the client SHOULD lock_owner on that file. If record locks are held, the client SHOULD
release all locks before issuing a CLOSE. The server MAY free all release all locks before issuing a CLOSE. The server MAY free all
outstanding locks on CLOSE but some servers may not support the CLOSE outstanding locks on CLOSE but some servers may not support the CLOSE
of a file that still has record locks held. The server MUST return of a file that still has record locks held. The server MUST return
failure, NFS4ERR_LOCKS_HELD, if any locks would exist after the failure, NFS4ERR_LOCKS_HELD, if any locks would exist after the
CLOSE. CLOSE.
The LOOKUP operation will return a filehandle without establishing The LOOKUP operation will return a filehandle without establishing
any lock state on the server. Without a valid stateid, the server any lock state on the server. Without a valid stateid, the server
will assume the client has the least access. For example, a file will assume the client has the least access. For example, a file
opened with deny READ/WRITE cannot be accessed using a filehandle opened with deny READ/WRITE cannot be accessed using a filehandle
Draft Specification NFS version 4 Protocol August 2002
obtained through LOOKUP because it would not have a valid stateid obtained through LOOKUP because it would not have a valid stateid
(i.e. using a stateid of all bits 0 or all bits 1). (i.e. using a stateid of all bits 0 or all bits 1).
8.10.1. Close and Retention of State Information 8.10.1. Close and Retention of State Information
Since a CLOSE operation requests deallocation of a stateid, dealing Since a CLOSE operation requests deallocation of a stateid, dealing
with retransmission of the CLOSE, may pose special difficulties, with retransmission of the CLOSE, may pose special difficulties,
since the state information, which normally would be used to since the state information, which normally would be used to
determine the state of the open file being designated, might be determine the state of the open file being designated, might be
deallocated, resulting in an NFS4ERR_BAD_STATEID error. deallocated, resulting in an NFS4ERR_BAD_STATEID error.
skipping to change at page 83, line 40 skipping to change at page 84, line 56
is not a retransmission. is not a retransmission.
o The time that a lockowner is freed by the server due to period o The time that a lockowner is freed by the server due to period
with no activity. with no activity.
o All locks for the client are freed as a result of a SETCLIENTID. o All locks for the client are freed as a result of a SETCLIENTID.
Servers may avoid this complexity, at the cost of less complete Servers may avoid this complexity, at the cost of less complete
protocol error checking, by simply responding NFS4_OK in the event of protocol error checking, by simply responding NFS4_OK in the event of
a CLOSE for a deallocated stateid, on the assumption that this case a CLOSE for a deallocated stateid, on the assumption that this case
must be caused by a retranmitted close. When adopting this approach, must be caused by a retransmitted close. When adopting this
it is desirable to at least log an error when returning a no-error
indication in this situation. If the server maintains a reply-cache Draft Specification NFS version 4 Protocol September 2002
mechanism, it can verify the CLOSE is indeed a retransmission and
avoid error logging in most cases. approach, it is desirable to at least log an error when returning a
no-error indication in this situation. If the server maintains a
reply-cache mechanism, it can verify the CLOSE is indeed a
retransmission and avoid error logging in most cases.
8.11. Open Upgrade and Downgrade 8.11. Open Upgrade and Downgrade
When an OPEN is done for a file and the lockowner for which the open When an OPEN is done for a file and the lockowner for which the open
is being done already has the file open, the result is to upgrade the is being done already has the file open, the result is to upgrade the
open file status maintained on the server to include the access and open file status maintained on the server to include the access and
deny bits specified by the new OPEN as well as those for the existing deny bits specified by the new OPEN as well as those for the existing
OPEN. The result is that there is one open file, as far as the OPEN. The result is that there is one open file, as far as the
protocol is concerned, and it includes the union of the access and protocol is concerned, and it includes the union of the access and
deny bits for all of the OPEN requests completed. Only a single deny bits for all of the OPEN requests completed. Only a single
CLOSE will be done to reset the effects of both OPEN's. Note that CLOSE will be done to reset the effects of both OPENs. Note that the
client, when issuing the OPEN, may not know that the same file is in
Draft Specification NFS version 4 Protocol August 2002 fact being opened. The above only applies if both OPENs result in
the OPENed object being designated by the same filehandle.
the client, when issuing the OPEN, may not know that the same file is
in fact being opened. The above only applies if both OPEN's result
in the OPEN'ed object being designated by the same filehandle.
When the server chooses to export multiple filehandles corresponding When the server chooses to export multiple filehandles corresponding
to the same file object and returns different filehandles on two to the same file object and returns different filehandles on two
different OPEN's of the same file object, the server MUST NOT "OR" different OPENs of the same file object, the server MUST NOT "OR"
together the access and deny bits and coalesce the two open files. together the access and deny bits and coalesce the two open files.
Instead the server must maintain separate OPEN's with separate Instead the server must maintain separate OPENs with separate
stateid's and will require separate CLOSE's to free them. stateids and will require separate CLOSEs to free them.
When multiple open files on the client are merged into a single open When multiple open files on the client are merged into a single open
file object on the server, the close of one of the open files (on the file object on the server, the close of one of the open files (on the
client) may necessitate change of the access and deny status of the client) may necessitate change of the access and deny status of the
open file on the server. This is because the union of the access and open file on the server. This is because the union of the access and
deny bits for the remaining open's may be smaller (i.e. a proper deny bits for the remaining opens may be smaller (i.e. a proper
subset) than previously. The OPEN_DOWNGRADE operation is used to subset) than previously. The OPEN_DOWNGRADE operation is used to
make the necessary change and the client should use it to update the make the necessary change and the client should use it to update the
server so that share reservation requests by other clients are server so that share reservation requests by other clients are
handled properly. handled properly.
8.12. Short and Long Leases 8.12. Short and Long Leases
When determining the time period for the server lease, the usual When determining the time period for the server lease, the usual
lease tradeoffs apply. Short leases are good for fast server lease tradeoffs apply. Short leases are good for fast server
recovery at a cost of increased RENEW or READ (with zero length) recovery at a cost of increased RENEW or READ (with zero length)
requests. Longer leases are certainly kinder and gentler to servers requests. Longer leases are certainly kinder and gentler to servers
trying to handle very large numbers of clients. The number of RENEW trying to handle very large numbers of clients. The number of RENEW
requests drop in proportion to the lease time. The disadvantages of requests drop in proportion to the lease time. The disadvantages of
long leases are slower recovery after server failure (server must long leases are slower recovery after server failure (the server must
wait for leases to expire and grace period before granting new lock wait for the leases to expire and the grace period to elapse before
requests) and increased file contention (if client fails to transmit granting new lock requests) and increased file contention (if client
an unlock request then server must wait for lease expiration before fails to transmit an unlock request then server must wait for lease
granting new locks). expiration before granting new locks).
Draft Specification NFS version 4 Protocol September 2002
Long leases are usable if the server is able to store lease state in Long leases are usable if the server is able to store lease state in
non-volatile memory. Upon recovery, the server can reconstruct the non-volatile memory. Upon recovery, the server can reconstruct the
lease state from its non-volatile memory and continue operation with lease state from its non-volatile memory and continue operation with
its clients and therefore long leases would not be an issue. its clients and therefore long leases would not be an issue.
8.13. Clocks, Propagation Delay, and Calculating Lease Expiration 8.13. Clocks, Propagation Delay, and Calculating Lease Expiration
To avoid the need for synchronized clocks, lease times are granted by To avoid the need for synchronized clocks, lease times are granted by
the server as a time delta. However, there is a requirement that the the server as a time delta. However, there is a requirement that the
client and server clocks do not drift excessively over the duration client and server clocks do not drift excessively over the duration
of the lock. There is also the issue of propagation delay across the of the lock. There is also the issue of propagation delay across the
network which could easily be several hundred milliseconds as well as network which could easily be several hundred milliseconds as well as
the possibility that requests will be lost and need to be the possibility that requests will be lost and need to be
retransmitted. retransmitted.
Draft Specification NFS version 4 Protocol August 2002
To take propagation delay into account, the client should subtract it To take propagation delay into account, the client should subtract it
from lease times (e.g. if the client estimates the one-way from lease times (e.g. if the client estimates the one-way
propagation delay as 200 msec, then it can assume that the lease is propagation delay as 200 msec, then it can assume that the lease is
already 200 msec old when it gets it). In addition, it will take already 200 msec old when it gets it). In addition, it will take
another 200 msec to get a response back to the server. So the client another 200 msec to get a response back to the server. So the client
must send a lock renewal or write data back to the server 400 msec must send a lock renewal or write data back to the server 400 msec
before the lease would expire. before the lease would expire.
The server's lease period configuration should take into account the The server's lease period configuration should take into account the
network distance of the clients that will be accessing the server's network distance of the clients that will be accessing the server's
resources. It is expected that the lease period will take into resources. It is expected that the lease period will take into
account the network propogation delays and other network delay account the network propagation delays and other network delay
factors for the client population. Since the protocol does not allow factors for the client population. Since the protocol does not allow
for an automatic method to determine an appropriate lease period, the for an automatic method to determine an appropriate lease period, the
server's administrator may have to tune the lease period. server's administrator may have to tune the lease period.
8.14. Migration, Replication and State 8.14. Migration, Replication and State
When responsibility for handling a given file system is transferred When responsibility for handling a given file system is transferred
to a new server (migration) or the client chooses to use an alternate to a new server (migration) or the client chooses to use an alternate
server (e.g. in response to server unresponsiveness) in the context server (e.g. in response to server unresponsiveness) in the context
of file system replication, the appropriate handling of state shared of file system replication, the appropriate handling of state shared
between the client and server (i.e. locks, leases, stateid's, and between the client and server (i.e. locks, leases, stateids, and
clientid's) is as described below. The handling differs between clientids) is as described below. The handling differs between
migration and replication. For related discussion of file server migration and replication. For related discussion of file server
state and recover of such see the sections under "File Locking and state and recover of such see the sections under "File Locking and
Share Reservations" Share Reservations"
If server replica or a server immigrating a filesystem agrees to, or If server replica or a server immigrating a filesystem agrees to, or
is expected to, accept opaque values from the client that originated is expected to, accept opaque values from the client that originated
from another server, then it is a wise implementation practice for from another server, then it is a wise implementation practice for
the servers to encode the "opaque" values in network byte order. This the servers to encode the "opaque" values in network byte order. This
way, servers acting as replicas or immigrating filesystems will be way, servers acting as replicas or immigrating filesystems will be
able to parse values like stateids, directory cookies, filehandles, able to parse values like stateids, directory cookies, filehandles,
etc. even if their native byte order is different from other servers etc. even if their native byte order is different from other servers
Draft Specification NFS version 4 Protocol September 2002
cooperating in the replication and migration of the filesystem. cooperating in the replication and migration of the filesystem.
8.14.1. Migration and State 8.14.1. Migration and State
In the case of migration, the servers involved in the migration of a In the case of migration, the servers involved in the migration of a
filesystem SHOULD transfer all server state from the original to the filesystem SHOULD transfer all server state from the original to the
new server. This must be done in a way that is transparent to the new server. This must be done in a way that is transparent to the
client. This state transfer will ease the client's transition when a client. This state transfer will ease the client's transition when a
filesystem migration occurs. If the servers are successful in filesystem migration occurs. If the servers are successful in
transferring all state, the client will continue to use stateid's transferring all state, the client will continue to use stateids
assigned by the original server. Therefore the new server must assigned by the original server. Therefore the new server must
recognize these stateid's as valid. This holds true for the clientid recognize these stateids as valid. This holds true for the clientid
as well. Since responsibility for an entire filesystem is as well. Since responsibility for an entire filesystem is
transferred with a migration event, there is no possibility that transferred with a migration event, there is no possibility that
conflicts will arise on the new server as a result of the transfer of conflicts will arise on the new server as a result of the transfer of
Draft Specification NFS version 4 Protocol August 2002
locks. locks.
As part of the transfer of information between servers, leases would As part of the transfer of information between servers, leases would
be transferred as well. The leases being transferred to the new be transferred as well. The leases being transferred to the new
server will typically have a different expiration time from those for server will typically have a different expiration time from those for
the same client, previously on the old server. To maintain the the same client, previously on the old server. To maintain the
property that all leases on a given server for a given client expire property that all leases on a given server for a given client expire
at the same time, the server should advance the expiration time to at the same time, the server should advance the expiration time to
the later of the leases being transferred or the leases already the later of the leases being transferred or the leases already
present. This allows the client to maintain lease renewal of both present. This allows the client to maintain lease renewal of both
skipping to change at page 86, line 33 skipping to change at page 87, line 48
NFS4ERR_STALE_STATEID from the new server. The client should then NFS4ERR_STALE_STATEID from the new server. The client should then
recover its state information as it normally would in response to a recover its state information as it normally would in response to a
server failure. The new server must take care to allow for the server failure. The new server must take care to allow for the
recovery of state information as it would in the event of server recovery of state information as it would in the event of server
restart. restart.
8.14.2. Replication and State 8.14.2. Replication and State
Since client switch-over in the case of replication is not under Since client switch-over in the case of replication is not under
server control, the handling of state is different. In this case, server control, the handling of state is different. In this case,
leases, stateid's and clientid's do not have validity across a leases, stateids and clientids do not have validity across a
transition from one server to another. The client must re-establish transition from one server to another. The client must re-establish
its locks on the new server. This can be compared to the re- its locks on the new server. This can be compared to the re-
establishment of locks by means of reclaim-type requests after a establishment of locks by means of reclaim-type requests after a
server reboot. The difference is that the server has no provision to server reboot. The difference is that the server has no provision to
distinguish requests reclaiming locks from those obtaining new locks distinguish requests reclaiming locks from those obtaining new locks
or to defer the latter. Thus, a client re-establishing a lock on the or to defer the latter. Thus, a client re-establishing a lock on the
new server (by means of a LOCK or OPEN request), may have the new server (by means of a LOCK or OPEN request), may have the
requests denied due to a conflicting lock. Since replication is requests denied due to a conflicting lock. Since replication is
Draft Specification NFS version 4 Protocol September 2002
intended for read-only use of filesystems, such denial of locks intended for read-only use of filesystems, such denial of locks
should not pose large difficulties in practice. When an attempt to should not pose large difficulties in practice. When an attempt to
re-establish a lock on a new server is denied, the client should re-establish a lock on a new server is denied, the client should
treat the situation as if his original lock had been revoked. treat the situation as if his original lock had been revoked.
8.14.3. Notification of Migrated Lease 8.14.3. Notification of Migrated Lease
In the case of lease renewal, the client may not be submitting In the case of lease renewal, the client may not be submitting
requests for a filesystem that has been migrated to another server. requests for a filesystem that has been migrated to another server.
This can occur because of the implicit lease renewal mechanism. The This can occur because of the implicit lease renewal mechanism. The
client renews leases for all filesystems when submitting a request to client renews leases for all filesystems when submitting a request to
any one filesystem at the server. any one filesystem at the server.
In order for the client to schedule renewal of leases that may have In order for the client to schedule renewal of leases that may have
been relocated to the new server, the client must find out about been relocated to the new server, the client must find out about
Draft Specification NFS version 4 Protocol August 2002
lease relocation before those leases expire. To accomplish this, all lease relocation before those leases expire. To accomplish this, all
operations which implicitly renew leases for a client (i.e. OPEN, operations which implicitly renew leases for a client (i.e. OPEN,
CLOSE, READ, WRITE, RENEW, LOCK, LOCKT, LOCKU), will return the error CLOSE, READ, WRITE, RENEW, LOCK, LOCKT, LOCKU), will return the error
NFS4ERR_LEASE_MOVED if responsibility for any of the leases to be NFS4ERR_LEASE_MOVED if responsibility for any of the leases to be
renewed has been transferred to a new server. This condition will renewed has been transferred to a new server. This condition will
continue until the client receives an NFS4ERR_MOVED error and the continue until the client receives an NFS4ERR_MOVED error and the
server receives the subsequent GETATTR(fs_locations) for an access to server receives the subsequent GETATTR(fs_locations) for an access to
each filesystem for which a lease has been moved to a new server. each filesystem for which a lease has been moved to a new server.
When a client receives an NFS4ERR_LEASE_MOVED error, it should When a client receives an NFS4ERR_LEASE_MOVED error, it should
perform some operation, such as a RENEW, on each filesystem perform an operation on each filesystem associated with the server in
associated with the server in question. When the client receives an question. When the client receives an NFS4ERR_MOVED error, the
NFS4ERR_MOVED error, the client can follow the normal process to client can follow the normal process to obtain the new server
obtain the new server information (through the fs_locations information (through the fs_locations attribute) and perform renewal
attribute) and perform renewal of those leases on the new server. If of those leases on the new server. If the server has not had state
the server has not had state transferred to it transparently, the transferred to it transparently, the client will receive either
client will receive either NFS4ERR_STALE_CLIENTID or NFS4ERR_STALE_CLIENTID or NFS4ERR_STALE_STATEID from the new server,
NFS4ERR_STALE_STATEID from the new server, as described above, and as described above, and the client can then recover state information
the client can then recover state information as it does in the event as it does in the event of server failure.
of server failure.
8.14.4. Migration and the Lease_time Attribute 8.14.4. Migration and the Lease_time Attribute
In order that the client may appropriately manage its leases in the In order that the client may appropriately manage its leases in the
case of migration, the destination server must establish proper case of migration, the destination server must establish proper
values for the lease_time attribute. values for the lease_time attribute.
When state is transferred transparently, that state should include When state is transferred transparently, that state should include
the correct value of the lease_time attribute. The lease_time the correct value of the lease_time attribute. The lease_time
attribute on the destination server must never be less than that on attribute on the destination server must never be less than that on
the source since this would result in premature expiration of leases the source since this would result in premature expiration of leases
granted by the source server. Upon migration in which state is granted by the source server. Upon migration in which state is
transferred transparently, the client is under no obligation to re- transferred transparently, the client is under no obligation to re-
fetch the lease_time attribute and may continue to use the value fetch the lease_time attribute and may continue to use the value
previously fetched (on the source server). previously fetched (on the source server).
Draft Specification NFS version 4 Protocol September 2002
If state has not been transferred transparently (i.e. the client sees If state has not been transferred transparently (i.e. the client sees
a real or simulated server reboot), the client should fetch the value a real or simulated server reboot), the client should fetch the value
of lease_time on the new (i.e. destination) server, and use it for of lease_time on the new (i.e. destination) server, and use it for
subsequent locking requests. However the server must respect a grace subsequent locking requests. However the server must respect a grace
period at least as long as the lease_time on the source server, in period at least as long as the lease_time on the source server, in
order to ensure that clients have ample time to reclaim their locks order to ensure that clients have ample time to reclaim their locks
before potentially conflicting non-reclaimed locks are granted. The before potentially conflicting non-reclaimed locks are granted. The
means by which the new server obtains the value of lease_time on the means by which the new server obtains the value of lease_time on the
old server is left to the server implementations. It is not old server is left to the server implementations. It is not
specified by the NFS version 4 protocol. specified by the NFS version 4 protocol.
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
9. Client-Side Caching 9. Client-Side Caching
Client-side caching of data, of file attributes, and of file names is Client-side caching of data, of file attributes, and of file names is
essential to providing good performance with the NFS protocol. essential to providing good performance with the NFS protocol.
Providing distributed cache coherence is a difficult problem and Providing distributed cache coherence is a difficult problem and
previous versions of the NFS protocol have not attempted it. previous versions of the NFS protocol have not attempted it.
Instead, several NFS client implementation techniques have been used Instead, several NFS client implementation techniques have been used
to reduce the problems that a lack of coherence poses for users. to reduce the problems that a lack of coherence poses for users.
These techniques have not been clearly defined by earlier protocol These techniques have not been clearly defined by earlier protocol
skipping to change at page 89, line 5 skipping to change at page 91, line 5
performance is to allow a client that repeatedly opens a file to do performance is to allow a client that repeatedly opens a file to do
so without reference to the server. This is done until potentially so without reference to the server. This is done until potentially
conflicting operations from another client actually occur. conflicting operations from another client actually occur.
A similar situation arises in connection with file locking. Sending A similar situation arises in connection with file locking. Sending
file lock and unlock requests to the server as well as the read and file lock and unlock requests to the server as well as the read and
write requests necessary to make data caching consistent with the write requests necessary to make data caching consistent with the
locking semantics (see the section "Data Caching and File Locking") locking semantics (see the section "Data Caching and File Locking")
can severely limit performance. When locking is used to provide can severely limit performance. When locking is used to provide
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
protection against infrequent conflicts, a large penalty is incurred. protection against infrequent conflicts, a large penalty is incurred.
This penalty may discourage the use of file locking by applications. This penalty may discourage the use of file locking by applications.
The NFS version 4 protocol provides more aggressive caching The NFS version 4 protocol provides more aggressive caching
strategies with the following design goals: strategies with the following design goals:
o Compatibility with a large range of server semantics. o Compatibility with a large range of server semantics.
o Provide the same caching benefits as previous versions of the o Provide the same caching benefits as previous versions of the
skipping to change at page 90, line 5 skipping to change at page 92, line 5
on them. Preliminary testing of callback functionality by means of a on them. Preliminary testing of callback functionality by means of a
CB_NULL procedure determines whether callbacks can be supported. The CB_NULL procedure determines whether callbacks can be supported. The
CB_NULL procedure checks the continuity of the callback path. A CB_NULL procedure checks the continuity of the callback path. A
server makes a preliminary assessment of callback availability to a server makes a preliminary assessment of callback availability to a
given client and avoids delegating responsibilities until it has given client and avoids delegating responsibilities until it has
determined that callbacks are supported. Because the granting of a determined that callbacks are supported. Because the granting of a
delegation is always conditional upon the absence of conflicting delegation is always conditional upon the absence of conflicting
access, clients must not assume that a delegation will be granted and access, clients must not assume that a delegation will be granted and
they must always be prepared for OPENs to be processed without any they must always be prepared for OPENs to be processed without any
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
delegations being granted. delegations being granted.
Once granted, a delegation behaves in most ways like a lock. There Once granted, a delegation behaves in most ways like a lock. There
is an associated lease that is subject to renewal together with all is an associated lease that is subject to renewal together with all
of the other leases held by that client. of the other leases held by that client.
Unlike locks, an operation by a second client to a delegated file Unlike locks, an operation by a second client to a delegated file
will cause the server to recall a delegation through a callback. will cause the server to recall a delegation through a callback.
skipping to change at page 91, line 5 skipping to change at page 93, line 5
There are three situations that delegation recovery must deal with: There are three situations that delegation recovery must deal with:
o Client reboot or restart o Client reboot or restart
o Server reboot or restart o Server reboot or restart
o Network partition (full or callback-only) o Network partition (full or callback-only)
In the event the client reboots or restarts, the failure to renew In the event the client reboots or restarts, the failure to renew
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
leases will result in the revocation of record locks and share leases will result in the revocation of record locks and share
reservations. Delegations, however, may be treated a bit reservations. Delegations, however, may be treated a bit
differently. differently.
There will be situations in which delegations will need to be There will be situations in which delegations will need to be
reestablished after a client reboots or restarts. The reason for reestablished after a client reboots or restarts. The reason for
this is the client may have file data stored locally and this data this is the client may have file data stored locally and this data
was associated with the previously held delegations. The client will was associated with the previously held delegations. The client will
need to reestablish the appropriate file state on the server. need to reestablish the appropriate file state on the server.
skipping to change at page 92, line 5 skipping to change at page 94, line 5
process of handling delegation reclaim reconciles three principles of process of handling delegation reclaim reconciles three principles of
the NFS version 4 protocol: the NFS version 4 protocol:
o Upon reclaim, a client reporting resources assigned to it by an o Upon reclaim, a client reporting resources assigned to it by an
earlier server instance must be granted those resources. earlier server instance must be granted those resources.
o The server has unquestionable authority to determine whether o The server has unquestionable authority to determine whether
delegations are to be granted and, once granted, whether they delegations are to be granted and, once granted, whether they
are to be continued. are to be continued.
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
o The use of callbacks is not to be depended upon until the client o The use of callbacks is not to be depended upon until the client
has proven its ability to receive them. has proven its ability to receive them.
When a network partition occurs, delegations are subject to freeing When a network partition occurs, delegations are subject to freeing
by the server when the lease renewal period expires. This is similar by the server when the lease renewal period expires. This is similar
to the behavior for locks and share reservations. For delegations, to the behavior for locks and share reservations. For delegations,
however, the server may extend the period in which conflicting however, the server may extend the period in which conflicting
requests are held off. Eventually the occurrence of a conflicting requests are held off. Eventually the occurrence of a conflicting
request from another client will cause revocation of the delegation. request from another client will cause revocation of the delegation.
skipping to change at page 93, line 5 skipping to change at page 95, line 5
invalidate the assumptions that those using these facilities depend invalidate the assumptions that those using these facilities depend
upon. upon.
9.3.1. Data Caching and OPENs 9.3.1. Data Caching and OPENs
In order to avoid invalidating the sharing assumptions that In order to avoid invalidating the sharing assumptions that
applications rely on, NFS version 4 clients should not provide cached applications rely on, NFS version 4 clients should not provide cached
data to applications or modify it on behalf of an application when it data to applications or modify it on behalf of an application when it
would not be valid to obtain or modify that same data via a READ or would not be valid to obtain or modify that same data via a READ or
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
WRITE operation. WRITE operation.
Furthermore, in the absence of open delegation (see the section "Open Furthermore, in the absence of open delegation (see the section "Open
Delegation") two additional rules apply. Note that these rules are Delegation") two additional rules apply. Note that these rules are
obeyed in practice by many NFS version 2 and version 3 clients. obeyed in practice by many NFS version 2 and version 3 clients.
o First, cached data present on a client must be revalidated after o First, cached data present on a client must be revalidated after
doing an OPEN. Revalidating means that the client fetches the doing an OPEN. Revalidating means that the client fetches the
change attribute from the server, compares it with the cached change attribute from the server, compares it with the cached
skipping to change at page 94, line 5 skipping to change at page 96, line 5
written to the file. Hence, this requirement. written to the file. Hence, this requirement.
9.3.2. Data Caching and File Locking 9.3.2. Data Caching and File Locking
For those applications that choose to use file locking instead of For those applications that choose to use file locking instead of
share reservations to exclude inconsistent file access, there is an share reservations to exclude inconsistent file access, there is an
analogous set of constraints that apply to client side data caching. analogous set of constraints that apply to client side data caching.
These rules are effective only if the file locking is used in a way These rules are effective only if the file locking is used in a way
that matches in an equivalent way the actual READ and WRITE that matches in an equivalent way the actual READ and WRITE
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
operations executed. This is as opposed to file locking that is operations executed. This is as opposed to file locking that is
based on pure convention. For example, it is possible to manipulate based on pure convention. For example, it is possible to manipulate
a two-megabyte file by dividing the file into two one-megabyte a two-megabyte file by dividing the file into two one-megabyte
regions and protecting access to the two regions by file locks on regions and protecting access to the two regions by file locks on
bytes zero and one. A lock for write on byte zero of the file would bytes zero and one. A lock for write on byte zero of the file would
represent the right to do READ and WRITE operations on the first represent the right to do READ and WRITE operations on the first
region. A lock for write on byte one of the file would represent the region. A lock for write on byte one of the file would represent the
right to do READ and WRITE operations on the second region. As long right to do READ and WRITE operations on the second region. As long
as all applications manipulating the file obey this convention, they as all applications manipulating the file obey this convention, they
skipping to change at page 94, line 50 skipping to change at page 96, line 50
unlocked may cause invalid modification to the region outside the unlocked may cause invalid modification to the region outside the
unlocked area. This, in turn, may be part of a region locked by unlocked area. This, in turn, may be part of a region locked by
another client. Clients can avoid this situation by synchronously another client. Clients can avoid this situation by synchronously
performing portions of write operations that overlap that portion performing portions of write operations that overlap that portion
(initial or final) that is not a full block. Similarly, invalidating (initial or final) that is not a full block. Similarly, invalidating
a locked area which is not an integral number of full buffer blocks a locked area which is not an integral number of full buffer blocks
would require the client to read one or two partial blocks from the would require the client to read one or two partial blocks from the
server if the revalidation procedure shows that the data which the server if the revalidation procedure shows that the data which the
client possesses may not be valid. client possesses may not be valid.
The data that is written to the server as a pre-requisite to the The data that is written to the server as a prerequisite to the
unlocking of a region must be written, at the server, to stable unlocking of a region must be written, at the server, to stable
storage. The client may accomplish this either with synchronous storage. The client may accomplish this either with synchronous
writes or by following asynchronous writes with a COMMIT operation. writes or by following asynchronous writes with a COMMIT operation.
This is required because retransmission of the modified data after a This is required because retransmission of the modified data after a
server reboot might conflict with a lock held by another client. server reboot might conflict with a lock held by another client.
A client implementation may choose to accommodate applications which A client implementation may choose to accommodate applications which
use record locking in non-standard ways (e.g. using a record lock as use record locking in non-standard ways (e.g. using a record lock as
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
a global semaphore) by flushing to the server more data upon an LOCKU a global semaphore) by flushing to the server more data upon an LOCKU
than is covered by the locked range. This may include modified data than is covered by the locked range. This may include modified data
within files other than the one for which the unlocks are being done. within files other than the one for which the unlocks are being done.
In such cases, the client must not interfere with applications whose In such cases, the client must not interfere with applications whose
READs and WRITEs are being done only within the bounds of record READs and WRITEs are being done only within the bounds of record
locks which the application holds. For example, an application locks locks which the application holds. For example, an application locks
a single byte of a file and proceeds to write that single byte. A a single byte of a file and proceeds to write that single byte. A
client that chose to handle a LOCKU by flushing all modified data to client that chose to handle a LOCKU by flushing all modified data to
the server could validly write that single byte in response to an the server could validly write that single byte in response to an
skipping to change at page 95, line 45 skipping to change at page 97, line 45
satisfy the request using the client's validated cache. If an satisfy the request using the client's validated cache. If an
appropriate file lock is not held for the range of the read or write, appropriate file lock is not held for the range of the read or write,
the read or write request must not be satisfied by the client's cache the read or write request must not be satisfied by the client's cache
and the request must be sent to the server for processing. When a and the request must be sent to the server for processing. When a
read or write request partially overlaps a locked region, the request read or write request partially overlaps a locked region, the request
should be subdivided into multiple pieces with each region (locked or should be subdivided into multiple pieces with each region (locked or
not) treated appropriately. not) treated appropriately.
9.3.4. Data Caching and File Identity 9.3.4. Data Caching and File Identity
When clients cache data, the file data needs to organized according When clients cache data, the file data needs to be organized
to the filesystem object to which the data belongs. For NFS version according to the filesystem object to which the data belongs. For
3 clients, the typical practice has been to assume for the purpose of NFS version 3 clients, the typical practice has been to assume for
caching that distinct filehandles represent distinct filesystem the purpose of caching that distinct filehandles represent distinct
objects. The client then has the choice to organize and maintain the filesystem objects. The client then has the choice to organize and
data cache on this basis. maintain the data cache on this basis.
In the NFS version 4 protocol, there is now the possibility to have In the NFS version 4 protocol, there is now the possibility to have
significant deviations from a "one filehandle per object" model significant deviations from a "one filehandle per object" model
because a filehandle may be constructed on the basis of the object's because a filehandle may be constructed on the basis of the object's
pathname. Therefore, clients need a reliable method to determine if pathname. Therefore, clients need a reliable method to determine if
two filehandles designate the same filesystem object. If clients two filehandles designate the same filesystem object. If clients
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
were simply to assume that all distinct filehandles denote distinct were simply to assume that all distinct filehandles denote distinct
objects and proceed to do data caching on this basis, caching objects and proceed to do data caching on this basis, caching
inconsistencies would arise between the distinct client side objects inconsistencies would arise between the distinct client side objects
which mapped to the same server side object. which mapped to the same server side object.
By providing a method to differentiate filehandles, the NFS version 4 By providing a method to differentiate filehandles, the NFS version 4
protocol alleviates a potential functional regression in comparison protocol alleviates a potential functional regression in comparison
with the NFS version 3 protocol. Without this method, caching with the NFS version 3 protocol. Without this method, caching
inconsistencies within the same client could occur and this has not inconsistencies within the same client could occur and this has not
been present in previous versions of the NFS protocol. Note that it been present in previous versions of the NFS protocol. Note that it
is possible to have such inconsistencies with applications executing is possible to have such inconsistencies with applications executing
on multiple clients but that is not the issue being addressed here. on multiple clients but that is not the issue being addressed here.
For the purposes of data caching, the following steps allow an NFS For the purposes of data caching, the following steps allow an NFS
version 4 client to determine whether two distinct filehandles denote version 4 client to determine whether two distinct filehandles denote
the same server side object: the same server side object:
o If GETATTR directed to two filehandles have different values of o If GETATTR directed to two filehandles returns different values
the fsid attribute, then the filehandles represent distinct of the fsid attribute, then the filehandles represent distinct
objects. objects.
o If GETATTR for any file with an fsid that matches the fsid of o If GETATTR for any file with an fsid that matches the fsid of
the two filehandles in question returns a unique_handles the two filehandles in question returns a unique_handles
attribute with a value of TRUE, then the two objects are attribute with a value of TRUE, then the two objects are
distinct. distinct.
o If GETATTR directed to the two filehandles does not return the o If GETATTR directed to the two filehandles does not return the
fileid attribute for one or both of the handles, then it cannot fileid attribute for both of the handles, then it cannot be
be determined whether the two objects are the same. Therefore, determined whether the two objects are the same. Therefore,
operations which depend on that knowledge (e.g. client side data operations which depend on that knowledge (e.g. client side data
caching) cannot be done reliably. caching) cannot be done reliably.
o If GETATTR directed to the two filehandles returns different o If GETATTR directed to the two filehandles returns different
values for the fileid attribute, then they are distinct objects. values for the fileid attribute, then they are distinct objects.
o Otherwise they are the same object. o Otherwise they are the same object.
9.4. Open Delegation 9.4. Open Delegation
skipping to change at page 97, line 5 skipping to change at page 99, line 5
delegation is recallable, since the circumstances that allowed for delegation is recallable, since the circumstances that allowed for
the delegation are subject to change. In particular, the server may the delegation are subject to change. In particular, the server may
receive a conflicting OPEN from another client, the server must receive a conflicting OPEN from another client, the server must
recall the delegation before deciding whether the OPEN from the other recall the delegation before deciding whether the OPEN from the other
client may be granted. Making a delegation is up to the server and client may be granted. Making a delegation is up to the server and
clients should not assume that any particular OPEN either will or clients should not assume that any particular OPEN either will or
will not result in an open delegation. The following is a typical will not result in an open delegation. The following is a typical
set of conditions that servers might use in deciding whether OPEN set of conditions that servers might use in deciding whether OPEN
should be delegated: should be delegated:
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
o The client must be able to respond to the server's callback o The client must be able to respond to the server's callback
requests. The server will use the CB_NULL procedure for a test requests. The server will use the CB_NULL procedure for a test
of callback ability. of callback ability.
o The client must have responded properly to previous recalls. o The client must have responded properly to previous recalls.
o There must be no current open conflicting with the requested o There must be no current open conflicting with the requested
delegation. delegation.
skipping to change at page 98, line 5 skipping to change at page 100, line 5
When an open delegation is made, the response to the OPEN contains an When an open delegation is made, the response to the OPEN contains an
open delegation structure which specifies the following: open delegation structure which specifies the following:
o the type of delegation (read or write) o the type of delegation (read or write)
o space limitation information to control flushing of data on o space limitation information to control flushing of data on
close (write open delegation only, see the section "Open close (write open delegation only, see the section "Open
Delegation and Data Caching") Delegation and Data Caching")
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
o an nfsace4 specifying read and write permissions o an nfsace4 specifying read and write permissions
o a stateid to represent the delegation for READ and WRITE o a stateid to represent the delegation for READ and WRITE
The delegation stateid is separate and distinct from the stateid for The delegation stateid is separate and distinct from the stateid for
the OPEN proper. The standard stateid, unlike the delegation the OPEN proper. The standard stateid, unlike the delegation
stateid, is associated with a particular lock_owner and will continue stateid, is associated with a particular lock_owner and will continue
to be valid after the delegation is recalled and the file remains to be valid after the delegation is recalled and the file remains
open. open.
skipping to change at page 99, line 5 skipping to change at page 101, line 5
The use of delegation together with various other forms of caching The use of delegation together with various other forms of caching
creates the possibility that no server authentication will ever be creates the possibility that no server authentication will ever be
performed for a given user since all of the user's requests might be performed for a given user since all of the user's requests might be
satisfied locally. Where the client is depending on the server for satisfied locally. Where the client is depending on the server for
authentication, the client should be sure authentication occurs for authentication, the client should be sure authentication occurs for
each user by use of the ACCESS operation. This should be the case each user by use of the ACCESS operation. This should be the case
even if an ACCESS operation would not be required otherwise. As even if an ACCESS operation would not be required otherwise. As
mentioned before, the server may enforce frequent authentication by mentioned before, the server may enforce frequent authentication by
returning an nfsace4 denying all access with every open delegation. returning an nfsace4 denying all access with every open delegation.
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
9.4.1. Open Delegation and Data Caching 9.4.1. Open Delegation and Data Caching
OPEN delegation allows much of the message overhead associated with OPEN delegation allows much of the message overhead associated with
the opening and closing files to be eliminated. An open when an open the opening and closing files to be eliminated. An open when an open
delegation is in effect does not require that a validation message be delegation is in effect does not require that a validation message be
sent to the server. The continued endurance of the "read open sent to the server. The continued endurance of the "read open
delegation" provides a guarantee that no OPEN for write and thus no delegation" provides a guarantee that no OPEN for write and thus no
write has occurred. Similarly, when closing a file opened for write write has occurred. Similarly, when closing a file opened for write
and if write open delegation is in effect, the data written does not and if write open delegation is in effect, the data written does not
skipping to change at page 100, line 5 skipping to change at page 102, line 5
The server can recall delegations as a result of managing the The server can recall delegations as a result of managing the
available filesystem space. The client should abide by the server's available filesystem space. The client should abide by the server's
state space limits for delegations. If the client exceeds the stated state space limits for delegations. If the client exceeds the stated
limits for the delegation, the server's behavior is undefined. limits for the delegation, the server's behavior is undefined.
Based on server conditions, quotas or available filesystem space, the Based on server conditions, quotas or available filesystem space, the
server may grant write open delegations with very restrictive space server may grant write open delegations with very restrictive space
limitations. The limitations may be defined in a way that will limitations. The limitations may be defined in a way that will
always force modified data to be flushed to the server on close. always force modified data to be flushed to the server on close.
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
With respect to authentication, flushing modified data to the server With respect to authentication, flushing modified data to the server
after a CLOSE has occurred may be problematic. For example, the user after a CLOSE has occurred may be problematic. For example, the user
of the application may have logged off the client and unexpired of the application may have logged off the client and unexpired
authentication credentials may not be present. In this case, the authentication credentials may not be present. In this case, the
client may need to take special care to ensure that local unexpired client may need to take special care to ensure that local unexpired
credentials will in fact be available. This may be accomplished by credentials will in fact be available. This may be accomplished by
tracking the expiration time of credentials and flushing data well in tracking the expiration time of credentials and flushing data well in
advance of their expiration or by making private copies of advance of their expiration or by making private copies of
credentials to assure their availability when needed. credentials to assure their availability when needed.
skipping to change at page 100, line 51 skipping to change at page 102, line 51
Since CB_GETATTR is being used to satisfy another client's GETATTR Since CB_GETATTR is being used to satisfy another client's GETATTR
request, the server only needs to know if the client holding the request, the server only needs to know if the client holding the
delegation has a modified version of the file. If the client's copy delegation has a modified version of the file. If the client's copy
of the delegated file is not modified (data or size), the server can of the delegated file is not modified (data or size), the server can
satisfy the second client's GETATTR request from the attributes satisfy the second client's GETATTR request from the attributes
stored locally at the server. If the file is modified, the server stored locally at the server. If the file is modified, the server
only needs to know about this modified state. If the server only needs to know about this modified state. If the server
determines that the file is currently modified, it will respond to determines that the file is currently modified, it will respond to
the second client's GETATTR as if the file had been modified locally the second client's GETATTR as if the file had been modified locally
at the server. This means that the server will take the current time at the server.
and apply it to the construction of attributes like change and
time_modify.
Since the form of the change attribute is determined by the server Since the form of the change attribute is determined by the server
and is opaque to the client, the client and server need to agree on a and is opaque to the client, the client and server need to agree on a
Draft Specification NFS version 4 Protocol August 2002
method of communicating the modified state of the file. For the size method of communicating the modified state of the file. For the size
attribute, the client will report its current view of the file size. attribute, the client will report its current view of the file size.
Draft Specification NFS version 4 Protocol September 2002
For the change attribute, the handling is more involved. For the change attribute, the handling is more involved.
For the client, the following steps will be taken when receiving a For the client, the following steps will be taken when receiving a
write delegation: write delegation:
o The value of the change attribute will be obtained from the o The value of the change attribute will be obtained from the
server and cached. Let this value be represented by c. server and cached. Let this value be represented by c.
o The client will create a value greater than c that will be used o The client will create a value greater than c that will be used
for communicating modified data is held at the client. Let this for communicating modified data is held at the client. Let this
value be represented by d. value be represented by d.
o When the client is queried via CB_GETATTR for the change o When the client is queried via CB_GETATTR for the change
attribute, it checks to see if it holds modified data. If the attribute, it checks to see if it holds modified data. If the
file is modified, the value d is returned for the change file is modified, the value d is returned for the change
attribute value. If this file is not currently modified, the attribute value. If this file is not currently modified, the
client returns the value c for the change attribute. client returns the value c for the change attribute.
While the change attribute is opaque to client in the sense that it For simplicity of implementation, the client MAY for each CB_GETATTR
has no idea what units of time, if any, the server is counting change return the same value d. This is true even if, between successive
with, it is not opaque in that the client has to treat it as an CB_GETATTR operations, the client again modifies in the file's data
integer, and the server has to be able to see the results of the or metadata in its cache. The client can return the same value
client's changes to that integer. Therefore, the server MUST encode because the only requirement is that the client be able to indicate
the change attribute in network order when sending it to the client, to the server that the client holds modified data. Therefore, the
the client MUST decode it from network order to its native order when value of d may always be c + 1.
receiving it, and the client MUST encode it network order when
sending it to the server. For this reason, change is defined as an While the change attribute is opaque to the client in the sense that
integer, rather than an opaque array of octets. it has no idea what units of time, if any, the server is counting
change with, it is not opaque in that the client has to treat it as
an unsigned integer, and the server has to be able to see the results
of the client's changes to that integer. Therefore, the server MUST
encode the change attribute in network order when sending it to the
client. The client MUST decode it from network order to its native
order when receiving it and the client MUST encode it network order
when sending it to the server. For this reason, change is defined as
an unsigned integer rather than an opaque array of octets.
For the server, the following steps will be taken when providing a For the server, the following steps will be taken when providing a
write delegation: write delegation:
o On providing a write delegation, the server will cache a copy of o Upon providing a write delegation, the server will cache a copy
the change attribute. Let this value be represented by sc. of the change attribute in the data structure it uses to record
the delegation. Let this value be represented by sc.
o The server obtains the change attribute from the client. Let o When a second client sends a GETATTR operation on the same file
this value be cc. to the server, the server obtains the change attribute from the
first client. Let this value be cc.
o If the value cc is equal to sc, the file is not modified and the o If the value cc is equal to sc, the file is not modified and the
server returns the current values for change and time_modify server returns the current values for change, time_metadata, and
(for example) to the client requesting GETATTR. time_modify (for example) to the second client.
Draft Specification NFS version 4 Protocol September 2002
o If the value cc is NOT equal to sc, the file is currently o If the value cc is NOT equal to sc, the file is currently
modified at the client and most likely will be modified at the modified at the first client and most likely will be modified at
server at a future time. The server then uses the current time the server at a future time. The server then uses its current
to construct attributes values for change and time_modify and time to construct attribute values for time_metadata and
returns those values to the requestor. time_modify. A new value of sc, which we will call nsc, is
computed by the server, such that nsc >= sc + 1. The server
then returns the constructed time_metadata, time_modify, and nsc
values to the requester. The server replaces sc in the
delegation record with nsc. To prevent the possibility of
time_modify, time_metadata, and change from appearing to go
backward (which would happen if the client holding the
delegation fails to write its modified data to the server before
the delegation is revoked or returned), the server SHOULD update
the file's metadata record with the constructed attribute
values. For reasons of reasonable performance, committing the
constructed attribute values to stable storage is OPTIONAL.
As discussed earlier in this section, the client MAY return the
same cc value on subsequent CB_GETATTR calls, even if the file
was modified in the client's cache yet again between successive
CB_GETATTR calls. Therefore, the server must assume that the
file has been modified yet again, and MUST take care to ensure
that the new nsc it constructs and returns is greater than the
previous nsc it returned. An example implementation's
delegation record would satisfy this mandate by including a
boolean field (let us call it "modified") that is set to false
when the delegation is granted, and an sc value set at the time
of grant to the change attribute value. The modified field would
be set to true the first time cc != sc, and would stay true
until the delegation is returned or revoked. The processing for
constructing nsc, time_modify, and time_metadata would use this
pseudo code:
if (!modified) {
do CB_GETATTR for change and size;
if (cc != sc)
modified = TRUE; } else {
do CB_GETATTR for size; }
if (modified) {
sc = sc + 1;
time_modify = time_metadata = current_time;
update sc, time_modify, time_metadata into file's metadata;
}
return to client (that sent GETATTR) the attributes
it requested, but make sure size comes from what
CB_GETATTR returned. Do not update the file's metadata
with the client's modified size.
o In the case that the file attribute size is different than the o In the case that the file attribute size is different than the
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
server's current value, the server treats this as a modification server's current value, the server treats this as a modification
regardless of the value of the change attribute retrieved via regardless of the value of the change attribute retrieved via
CB_GETATTR and responds to the second client as in the last CB_GETATTR and responds to the second client as in the last
step. step.
This methodology resolves issues of clock differences between client This methodology resolves issues of clock differences between client
and server and other scenarios where the use of CB_GETATTR break and server and other scenarios where the use of CB_GETATTR break
down. down.
It should be noted that the server is under no obligation to use
CB_GETATTR and therefore the server MAY simply recall the delegation
to avoid its use.
9.4.4. Recall of Open Delegation 9.4.4. Recall of Open Delegation
The following events necessitate recall of an open delegation: The following events necessitate recall of an open delegation:
o Potentially conflicting OPEN request (or READ/WRITE done with o Potentially conflicting OPEN request (or READ/WRITE done with
"special" stateid) "special" stateid)
o SETATTR issued by another client o SETATTR issued by another client
o REMOVE request for the file o REMOVE request for the file
skipping to change at page 102, line 53 skipping to change at page 106, line 4
same updates must be done whenever a client chooses to return a same updates must be done whenever a client chooses to return a
delegation voluntarily. The following items of state need to be delegation voluntarily. The following items of state need to be
dealt with: dealt with:
o If the file associated with the delegation is no longer open and o If the file associated with the delegation is no longer open and
no previous CLOSE operation has been sent to the server, a CLOSE no previous CLOSE operation has been sent to the server, a CLOSE
operation must be sent to the server. operation must be sent to the server.
o If a file has other open references at the client, then OPEN o If a file has other open references at the client, then OPEN
operations must be sent to the server. The appropriate stateids operations must be sent to the server. The appropriate stateids
Draft Specification NFS version 4 Protocol September 2002
will be provided by the server for subsequent use by the client will be provided by the server for subsequent use by the client
since the delegation stateid will not longer be valid. These since the delegation stateid will not longer be valid. These
OPEN requests are done with the claim type of OPEN requests are done with the claim type of
CLAIM_DELEGATE_CUR. This will allow the presentation of the CLAIM_DELEGATE_CUR. This will allow the presentation of the
Draft Specification NFS version 4 Protocol August 2002
delegation stateid so that the client can establish the delegation stateid so that the client can establish the
appropriate rights to perform the OPEN. (see the section appropriate rights to perform the OPEN. (see the section
"Operation 18: OPEN" for details.) "Operation 18: OPEN" for details.)
o If there are granted file locks, the corresponding LOCK o If there are granted file locks, the corresponding LOCK
operations need to be performed. This applies to the write open operations need to be performed. This applies to the write open
delegation case only. delegation case only.
o For a write open delegation, if at the time of recall the file o For a write open delegation, if at the time of recall the file
is not open for write, all modified data for the file must be is not open for write, all modified data for the file must be
skipping to change at page 103, line 55 skipping to change at page 107, line 4
constraints) make that desirable. Generally, however, the fact that constraints) make that desirable. Generally, however, the fact that
the actual open state of the file may continue to change makes it not the actual open state of the file may continue to change makes it not
worthwhile to send information about opens and closes to the server, worthwhile to send information about opens and closes to the server,
except as part of delegation return. Only in the case of closing the except as part of delegation return. Only in the case of closing the
open that resulted in obtaining the delegation would clients be open that resulted in obtaining the delegation would clients be
likely to do this early, since, in that case, the close once done likely to do this early, since, in that case, the close once done
will not be undone. Regardless of the client's choices on scheduling will not be undone. Regardless of the client's choices on scheduling
these actions, all must be performed before the delegation is these actions, all must be performed before the delegation is
returned, including (when applicable) the close that corresponds to returned, including (when applicable) the close that corresponds to
the open that resulted in the delegation. These actions can be the open that resulted in the delegation. These actions can be
Draft Specification NFS version 4 Protocol September 2002
performed either in previous requests or in previous operations in performed either in previous requests or in previous operations in
the same COMPOUND request. the same COMPOUND request.
Draft Specification NFS version 4 Protocol August 2002 9.4.5. Clients that Fail to Honor Delegation Recalls
9.4.5. Delegation Revocation A client may fail to respond to a recall for various reasons, such as
a failure of the callback path from server to the client. The client
may be unaware of a failure in the callback path. This lack of
awareness could result in the client finding out long after the
failure that its delegation has been revoked, and another client has
modified the data for which the client had a delegation. This is
especially a problem for the client that held a write delegation.
The server also has a dilemma in that the client that fails to
respond to the recall might also be sending other NFS requests,
including those that renew the lease before the lease expires.
Without returning an error for those lease renewing operations, the
server leads the client to believe that the delegation it has is in
force.
This difficulty is solved by the following rules:
o When the callback path is down, the server MUST NOT revoke the
delegation if one of the following occurs:
- The client has issued a RENEW operation and the server has
returned an NFS4ERR_CB_PATH_DOWN error. The server MUST renew
the lease for any record locks and share reservations the
client has that the server has known about (as opposed to those
locks and share reservations the client has established but not
yet sent to the server, due to the delegation). The server
SHOULD give the client a reasonable time to return its
delegations to the server before revoking the client's
delegations.
- The client has not issued a RENEW operation for some period of
time after the server attempted to recall the delegation. This
period of time MUST NOT be less than the value of the
lease_time attribute.
o When the client holds a delegation, it can not rely on operations,
except for RENEW, that take a stateid, to renew delegation leases
across callback path failures. The client that wants to keep
delegations in force across callback path failures must use RENEW
to do so.
9.4.6. Delegation Revocation
At the point a delegation is revoked, if there are associated opens At the point a delegation is revoked, if there are associated opens
on the client, the applications holding these opens need to be on the client, the applications holding these opens need to be
Draft Specification NFS version 4 Protocol September 2002
notified. This notification usually occurs by returning errors for notified. This notification usually occurs by returning errors for
READ/WRITE operations or when a close is attempted for the open file. READ/WRITE operations or when a close is attempted for the open file.
If no opens exist for the file at the point the delegation is If no opens exist for the file at the point the delegation is
revoked, then notification of the revocation is unnecessary. revoked, then notification of the revocation is unnecessary.
However, if there is modified data present at the client for the However, if there is modified data present at the client for the
file, the user of the application should be notified. Unfortunately, file, the user of the application should be notified. Unfortunately,
it may not be possible to notify the user since active applications it may not be possible to notify the user since active applications
may not be present at the client. See the section "Revocation may not be present at the client. See the section "Revocation
Recovery for Write Open Delegation" for additional details. Recovery for Write Open Delegation" for additional details.
skipping to change at page 105, line 4 skipping to change at page 108, line 53
violated. Depending on how errors are typically treated for the violated. Depending on how errors are typically treated for the
client operating environment, further levels of notification client operating environment, further levels of notification
including logging, console messages, and GUI pop-ups may be including logging, console messages, and GUI pop-ups may be
appropriate. appropriate.
9.5.1. Revocation Recovery for Write Open Delegation 9.5.1. Revocation Recovery for Write Open Delegation
Revocation recovery for a write open delegation poses the special Revocation recovery for a write open delegation poses the special
issue of modified data in the client cache while the file is not issue of modified data in the client cache while the file is not
open. In this situation, any client which does not flush modified open. In this situation, any client which does not flush modified
Draft Specification NFS version 4 Protocol August 2002
data to the server on each close must ensure that the user receives data to the server on each close must ensure that the user receives
appropriate notification of the failure as a result of the appropriate notification of the failure as a result of the
revocation. Since such situations may require human action to revocation. Since such situations may require human action to
correct problems, notification schemes in which the appropriate user correct problems, notification schemes in which the appropriate user
Draft Specification NFS version 4 Protocol September 2002
or administrator is notified may be necessary. Logging and console or administrator is notified may be necessary. Logging and console
messages are typical examples. messages are typical examples.
If there is modified data on the client, it must not be flushed If there is modified data on the client, it must not be flushed
normally to the server. A client may attempt to provide a copy of normally to the server. A client may attempt to provide a copy of
the file data as modified during the delegation under a different the file data as modified during the delegation under a different
name in the filesystem name space to ease recovery. Note that when name in the filesystem name space to ease recovery. Note that when
the client can determine that the file has not been modified by any the client can determine that the file has not been modified by any
other client, or when the client has a complete cached copy of file other client, or when the client has a complete cached copy of file
in question, such a saved copy of the client's view of the file may in question, such a saved copy of the client's view of the file may
skipping to change at page 106, line 5 skipping to change at page 109, line 54
cached. The exception to this are modifications to attributes that cached. The exception to this are modifications to attributes that
are intimately connected with data caching. Therefore, extending a are intimately connected with data caching. Therefore, extending a
file by writing data to the local data cache is reflected immediately file by writing data to the local data cache is reflected immediately
in the size as seen on the client without this change being in the size as seen on the client without this change being
immediately reflected on the server. Normally such changes are not immediately reflected on the server. Normally such changes are not
propagated directly to the server but when the modified data is propagated directly to the server but when the modified data is
flushed to the server, analogous attribute changes are made on the flushed to the server, analogous attribute changes are made on the
server. When open delegation is in effect, the modified attributes server. When open delegation is in effect, the modified attributes
may be returned to the server in the response to a CB_RECALL call. may be returned to the server in the response to a CB_RECALL call.
Draft Specification NFS version 4 Protocol August 2002
The result of local caching of attributes is that the attribute The result of local caching of attributes is that the attribute
caches maintained on individual clients will not be coherent. Changes caches maintained on individual clients will not be coherent. Changes
made in one order on the server may be seen in a different order on made in one order on the server may be seen in a different order on
Draft Specification NFS version 4 Protocol September 2002
one client and in a third order on a different client. one client and in a third order on a different client.
The typical filesystem application programming interfaces do not The typical filesystem application programming interfaces do not
provide means to atomically modify or interrogate attributes for provide means to atomically modify or interrogate attributes for
multiple files at the same time. The following rules provide an multiple files at the same time. The following rules provide an
environment where the potential incoherences mentioned above can be environment where the potential incoherences mentioned above can be
reasonably managed. These rules are derived from the practice of reasonably managed. These rules are derived from the practice of
previous NFS protocols. previous NFS protocols.
o All attributes for a given file (per-fsid attributes excepted) o All attributes for a given file (per-fsid attributes excepted)
skipping to change at page 107, line 4 skipping to change at page 110, line 56
The client may maintain a cache of modified attributes for those The client may maintain a cache of modified attributes for those
attributes intimately connected with data of modified regular files attributes intimately connected with data of modified regular files
(size, time_modify, and change). Other than those three attributes, (size, time_modify, and change). Other than those three attributes,
the client MUST NOT maintain a cache of modified attributes. Instead, the client MUST NOT maintain a cache of modified attributes. Instead,
attribute changes are immediately sent to the server. attribute changes are immediately sent to the server.
In some operating environments, the equivalent to time_access is In some operating environments, the equivalent to time_access is
expected to be implicitly updated by each read of the content of the expected to be implicitly updated by each read of the content of the
file object. If an NFS client is caching the content of a file file object. If an NFS client is caching the content of a file
object, whether it is a regular file, directory, or symbolic link, object, whether it is a regular file, directory, or symbolic link,
Draft Specification NFS version 4 Protocol August 2002
the client SHOULD NOT update the time_access attribute (via SETATTR the client SHOULD NOT update the time_access attribute (via SETATTR
or a small READ or READDIR request) on the server with each read that or a small READ or READDIR request) on the server with each read that
is satisfied from cache. The reason is that this can defeat the is satisfied from cache. The reason is that this can defeat the
Draft Specification NFS version 4 Protocol September 2002
performance benefits of caching content, especially since an explicit performance benefits of caching content, especially since an explicit
SETATTR of time_access may alter the change attribute on the server. SETATTR of time_access may alter the change attribute on the server.
If the change attribute changes, clients that are caching the content If the change attribute changes, clients that are caching the content
will think the content has changed, and will re-read unmodified data will think the content has changed, and will re-read unmodified data
from the server. Nor is the client encouraged to maintain a modified from the server. Nor is the client encouraged to maintain a modified
version of time_access in its cache, since this would mean that the version of time_access in its cache, since this would mean that the
client will either eventually have to write the access time to the client will either eventually have to write the access time to the
server with bad performance effects, or it would never update the server with bad performance effects, or it would never update the
server's time_access, thereby resulting in a situation where an server's time_access, thereby resulting in a situation where an
application that caches access time between a close and open of the application that caches access time between a close and open of the
same file observes the access time oscillating between the past and same file observes the access time oscillating between the past and
present. The time_access attribute always means the time of last present. The time_access attribute always means the time of last
access to a file by a read that was satisfied by the server. This way access to a file by a read that was satisfied by the server. This way
clients will tend to see only time_access changes that go forward in clients will tend to see only time_access changes that go forward in
time. time.
9.7. Name Caching 9.7. Data and Metadata Caching and Memory Mapped Files
Some operating environments include the capability for an application
to map a file's content into the application's address space. Each
time the application accesses a memory location that corresponds to a
block that has not been loaded into the address space, a page fault
occurs and the file is read (or if the block does not exist in the
file, the block is allocated and then instantiated in the
application's address space).
As long as each memory mapped access to the file requires a page
fault, the relevant attributes of the file that are used to detect
access and modification (time_access, time_metadata, time_modify, and
change) will be updated. However, in many operating environments,
when page faults are not required these attributes will not be
updated on reads or updates to the file via memory access (regardless
whether the file is local file or is being access remotely). A
client or server MAY fail to update attributes of a file that is
being accessed via memory mapped I/O. This has several implications:
o If there is an application on the server that has memory mapped
a file that a client is also accessing, the client may not be
able to get a consistent value of the change attribute to
determine whether its cache is stale or not. A server that
knows that the file is memory mapped could always
pessimistically return updated values for change so as to force
the application to always get the most up to date data and
metadata for the file. However, due to the negative performance
implications of this, such behavior is OPTIONAL.
o If the memory mapped file is not being modified on the server,
and instead is just being read by an application via the memory
mapped interface, the client will not see an updated time_access
attribute. However, in many operating environments, neither
will any process running on the server. Thus NFS clients are at
Draft Specification NFS version 4 Protocol September 2002
no disadvantage with respect to local processes.
o If there is another client that is memory mapping the file, and
if that client is holding a write delegation, the same set of
issues as discussed in the previous two bullet items apply. So,
when a server does a CB_GETATTR to a file that the client has
modified in its cache, the response from CB_GETATTR will not
necessarily be accurate. As discussed earlier, the client's
obligation is to report that the file has been modified since
the delegation was granted, not whether it has been modified
again between successive CB_GETATTR calls, and the server MUST
assume that any file the client has modified in cache has been
modified again between successive CB_GETATTR calls. Depending
on the nature of the client's memory management system, this
weak obligation may not be possible. A client MAY return stale
information in CB_GETATTR whenever the file is memory mapped.
o The mixture of memory mapping and file locking on the same file
is problematic. Consider the following scenario, where a page
size on each client is 8192 bytes.
- Client A memory maps first page (8192 bytes) of file X
- Client B memory maps first page (8192 bytes) of file X
- Client A write locks first 4096 bytes
- Client B write locks second 4096 bytes
- Client A, via a STORE instruction modifies part of its
locked region.
- Simultaneous to client A, client B issues a STORE on part
of its locked region.
Here the challenge is for each client to resynchronize to get a
correct view of the first page. In many operating environments,
the virtual memory management systems on each client only know a
page is modified, not that a subset of the page corresponding to
the respective lock regions has been modified. So it is not
possible for each client to do the right thing, which is to only
write to the server that portion of the page that is locked.
For example, if client A simply writes out the page, and then
client B writes out the page, client A's data is lost.
Moreover, if mandatory locking is enabled on the file, then we
have a different problem. When clients A and B issue the STORE
instructions, the resulting page faults require a record lock on
the entire page. Each client then tries to extend their locked
range to the entire page, which results in a deadlock.
Draft Specification NFS version 4 Protocol September 2002
Communicating the NFS4ERR_DEADLOCK error to a STORE instruction
is difficult at best.
If a client is locking the entire memory mapped file, there is
no problem with advisory or mandatory record locking, at least
until the client unlocks a region in the middle of the file.
Given the above issues the following are permitted:
- Clients and servers MAY deny memory mapping a file they
know there are record locks for.
- Clients and servers MAY deny a record lock on a file they
know is memory mapped.
- A client MAY deny memory mapping a file that it knows
requires mandatory locking for I/O. If mandatory locking
is enabled after the file is opened and mapped, the client
MAY deny the application further access to its mapped file.
9.8. Name Caching
The results of LOOKUP and READDIR operations may be cached to avoid The results of LOOKUP and READDIR operations may be cached to avoid
the cost of subsequent LOOKUP operations. Just as in the case of the cost of subsequent LOOKUP operations. Just as in the case of
attribute caching, inconsistencies may arise among the various client attribute caching, inconsistencies may arise among the various client
caches. To mitigate the effects of these inconsistencies and given caches. To mitigate the effects of these inconsistencies and given
the context of typical filesystem APIs, an upper time boundary is the context of typical filesystem APIs, an upper time boundary is
maintained on how long a client name cache entry can be kept without maintained on how long a client name cache entry can be kept without
verifying that the entry has not been made invalid by a directory verifying that the entry has not been made invalid by a directory
change operation performed by another client. change operation performed by another client.
skipping to change at page 107, line 55 skipping to change at page 114, line 4
determine whether there have been changes made to the directory by determine whether there have been changes made to the directory by
other clients. It does this by using the change attribute as other clients. It does this by using the change attribute as
reported before and after the directory operation in the associated reported before and after the directory operation in the associated
change_info4 value returned for the operation. The server is able to change_info4 value returned for the operation. The server is able to
communicate to the client whether the change_info4 data is provided communicate to the client whether the change_info4 data is provided
atomically with respect to the directory operation. If the change atomically with respect to the directory operation. If the change
values are provided atomically, the client is then able to compare values are provided atomically, the client is then able to compare
the pre-operation change value with the change value in the client's the pre-operation change value with the change value in the client's
name cache. If the comparison indicates that the directory was name cache. If the comparison indicates that the directory was
updated by another client, the name cache associated with the updated by another client, the name cache associated with the
Draft Specification NFS version 4 Protocol September 2002
modified directory is purged from the client. If the comparison modified directory is purged from the client. If the comparison
indicates no modification, the name cache can be updated on the indicates no modification, the name cache can be updated on the
client to reflect the directory operation and the associated timeout client to reflect the directory operation and the associated timeout
Draft Specification NFS version 4 Protocol August 2002
extended. The post-operation change value needs to be saved as the extended. The post-operation change value needs to be saved as the
basis for future change_info4 comparisons. basis for future change_info4 comparisons.
As demonstrated by the scenario above, name caching requires that the As demonstrated by the scenario above, name caching requires that the
client revalidate name cache data by inspecting the change attribute client revalidate name cache data by inspecting the change attribute
of a directory at the point when the name cache item was cached. of a directory at the point when the name cache item was cached.
This requires that the server update the change attribute for This requires that the server update the change attribute for
directories when the contents of the corresponding directory is directories when the contents of the corresponding directory is
modified. For a client to use the change_info4 information modified. For a client to use the change_info4 information
appropriately and correctly, the server must report the pre and post appropriately and correctly, the server must report the pre and post
operation change attribute values atomically. When the server is operation change attribute values atomically. When the server is
unable to report the before and after values atomically with respect unable to report the before and after values atomically with respect
to the directory operation, the server must indicate that fact in the to the directory operation, the server must indicate that fact in the
change_info4 return value. When the information is not atomically change_info4 return value. When the information is not atomically
reported, the client should not assume that other clients have not reported, the client should not assume that other clients have not
changed the directory. changed the directory.
9.8. Directory Caching 9.9. Directory Caching
The results of READDIR operations may be used to avoid subsequent The results of READDIR operations may be used to avoid subsequent
READDIR operations. Just as in the cases of attribute and name READDIR operations. Just as in the cases of attribute and name
caching, inconsistencies may arise among the various client caches. caching, inconsistencies may arise among the various client caches.
To mitigate the effects of these inconsistencies, and given the To mitigate the effects of these inconsistencies, and given the
context of typical filesystem APIs, the following rules should be context of typical filesystem APIs, the following rules should be
followed: followed:
o Cached READDIR information for a directory which is not obtained o Cached READDIR information for a directory which is not obtained
in a single READDIR operation must always be a consistent in a single READDIR operation must always be a consistent
skipping to change at page 108, line 55 skipping to change at page 115, line 4
question, checking the change attribute of the directory with GETATTR question, checking the change attribute of the directory with GETATTR
is adequate. The lifetime of the cache entry can be extended at is adequate. The lifetime of the cache entry can be extended at
these checkpoints. When a client is modifying the directory, the these checkpoints. When a client is modifying the directory, the
client needs to use the change_info4 data to determine whether there client needs to use the change_info4 data to determine whether there
are other clients modifying the directory. If it is determined that are other clients modifying the directory. If it is determined that
no other client modifications are occurring, the client may update no other client modifications are occurring, the client may update
its directory cache to reflect its own changes. its directory cache to reflect its own changes.
As demonstrated previously, directory caching requires that the As demonstrated previously, directory caching requires that the
client revalidate directory cache data by inspecting the change client revalidate directory cache data by inspecting the change
Draft Specification NFS version 4 Protocol September 2002
attribute of a directory at the point when the directory was cached. attribute of a directory at the point when the directory was cached.
This requires that the server update the change attribute for This requires that the server update the change attribute for
directories when the contents of the corresponding directory is directories when the contents of the corresponding directory is
Draft Specification NFS version 4 Protocol August 2002
modified. For a client to use the change_info4 information modified. For a client to use the change_info4 information
appropriately and correctly, the server must report the pre and post appropriately and correctly, the server must report the pre and post
operation change attribute values atomically. When the server is operation change attribute values atomically. When the server is
unable to report the before and after values atomically with respect unable to report the before and after values atomically with respect
to the directory operation, the server must indicate that fact in the to the directory operation, the server must indicate that fact in the
change_info4 return value. When the information is not atomically change_info4 return value. When the information is not atomically
reported, the client should not assume that other clients have not reported, the client should not assume that other clients have not
changed the directory. changed the directory.
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
10. Minor Versioning 10. Minor Versioning
To address the requirement of an NFS protocol that can evolve as the To address the requirement of an NFS protocol that can evolve as the
need arises, the NFS version 4 protocol contains the rules and need arises, the NFS version 4 protocol contains the rules and
framework to allow for future minor changes or versioning. framework to allow for future minor changes or versioning.
The base assumption with respect to minor versioning is that any The base assumption with respect to minor versioning is that any
future accepted minor version must follow the IETF process and be future accepted minor version must follow the IETF process and be
documented in a standards track RFC. Therefore, each minor version documented in a standards track RFC. Therefore, each minor version
skipping to change at page 111, line 5 skipping to change at page 117, line 5
documented attribute. documented attribute.
Since attribute results are specified as an opaque array of Since attribute results are specified as an opaque array of
per-attribute XDR encoded results, the complexity of adding new per-attribute XDR encoded results, the complexity of adding new
attributes in the midst of the current definitions will be too attributes in the midst of the current definitions will be too
burdensome. burdensome.
3 Minor versions must not modify the structure of an existing 3 Minor versions must not modify the structure of an existing
operation's arguments or results. operation's arguments or results.
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
Again the complexity of handling multiple structure definitions Again the complexity of handling multiple structure definitions
for a single operation is too burdensome. New operations should for a single operation is too burdensome. New operations should
be added instead of modifying existing structures for a minor be added instead of modifying existing structures for a minor
version. version.
This rule does not preclude the following adaptations in a minor This rule does not preclude the following adaptations in a minor
version. version.
o adding bits to flag fields such as new attributes to o adding bits to flag fields such as new attributes to
skipping to change at page 112, line 5 skipping to change at page 118, line 5
the request as an XDR decode error. This approach allows for the request as an XDR decode error. This approach allows for
the obsolescence of an operation while maintaining its structure the obsolescence of an operation while maintaining its structure
so that a future minor version can reintroduce the operation. so that a future minor version can reintroduce the operation.
8.1 Minor versions may declare attributes mandatory to NOT 8.1 Minor versions may declare attributes mandatory to NOT
implement. implement.
8.2 Minor versions may declare flag bits or enumeration values as 8.2 Minor versions may declare flag bits or enumeration values as
mandatory to NOT implement. mandatory to NOT implement.
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
9 Minor versions may downgrade features from mandatory to 9 Minor versions may downgrade features from mandatory to
recommended, or recommended to optional. recommended, or recommended to optional.
10 Minor versions may upgrade features from optional to recommended 10 Minor versions may upgrade features from optional to recommended
or recommended to mandatory. or recommended to mandatory.
11 A client and server that support minor version X must support 11 A client and server that support minor version X must support
minor versions 0 (zero) through X-1 as well. minor versions 0 (zero) through X-1 as well.
skipping to change at page 113, line 5 skipping to change at page 119, line 5
This rule allows for the introduction of new functionality and This rule allows for the introduction of new functionality and
forces the use of implementation experience before designating a forces the use of implementation experience before designating a
feature as mandatory. feature as mandatory.
13 A client MUST NOT attempt to use a stateid, filehandle, or 13 A client MUST NOT attempt to use a stateid, filehandle, or
similar returned object from the COMPOUND procedure with minor similar returned object from the COMPOUND procedure with minor
version X for another COMPOUND procedure with minor version Y, version X for another COMPOUND procedure with minor version Y,
where X != Y. where X != Y.
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
11. Internationalization 11. Internationalization
The primary issue in which NFS needs to deal with The primary issue in which NFS needs to deal with
internationalization, or I18N, is with respect to file names and internationalization, or I18N, is with respect to file names and
other strings as used within the protocol. The choice of string other strings as used within the protocol. The choice of string
representation must allow reasonable name/string access to clients representation must allow reasonable name/string access to clients
which use various languages. The UTF-8 encoding of the UCS as which use various languages. The UTF-8 encoding of the UCS as
defined by [ISO10646] allows for this type of access and follows the defined by [ISO10646] allows for this type of access and follows the
policy described in "IETF Policy on Character Sets and Languages", policy described in "IETF Policy on Character Sets and Languages",
skipping to change at page 114, line 5 skipping to change at page 120, line 5
could be understood by all clients and servers, and maintaining them could be understood by all clients and servers, and maintaining them
in the face of changes would be considerable. A better solution is in the face of changes would be considerable. A better solution is
desirable. desirable.
If the NFS version 4 protocol used a universal 16 bit or 32 bit If the NFS version 4 protocol used a universal 16 bit or 32 bit
character set (or an encoding of a 16 bit or 32 bit character set character set (or an encoding of a 16 bit or 32 bit character set
into octets), then the server and client need not care if the locale into octets), then the server and client need not care if the locale
of the user accessing the file is different than the locale of the of the user accessing the file is different than the locale of the
user who created the file. The unique 16 bit or 32 bit encoding of user who created the file. The unique 16 bit or 32 bit encoding of
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
the character allows for determination of what language the character the character allows for determination of what language the character
is from and also how to display that character on the client. The is from and also how to display that character on the client. The
server need not know what locales are used. server need not know what locales are used.
11.2. Overview of Universal Character Set Standards 11.2. Overview of Universal Character Set Standards
The previous section makes a case for using a universal character The previous section makes a case for using a universal character
set. This section makes the case for using UTF-8 as the specific set. This section makes the case for using UTF-8 as the specific
universal character set for the NFS version 4 protocol. universal character set for the NFS version 4 protocol.
skipping to change at page 115, line 5 skipping to change at page 121, line 5
encoding of UCS characters as described below. encoding of UCS characters as described below.
UTF-1 Only historical interest; it has been removed from 10646-1 UTF-1 Only historical interest; it has been removed from 10646-1
UTF-7 Encodes the entire "repertoire" of UCS "characters using UTF-7 Encodes the entire "repertoire" of UCS "characters using
only octets with the higher order bit clear". [RFC2152] only octets with the higher order bit clear". [RFC2152]
describes UTF-7. UTF-7 accomplishes this by reserving one describes UTF-7. UTF-7 accomplishes this by reserving one
of the 7bit US-ASCII characters as a "shift" character to of the 7bit US-ASCII characters as a "shift" character to
indicate non-US-ASCII characters. indicate non-US-ASCII characters.
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
UTF-8 Unlike UTF-7, uses all 8 bits of the octets. US-ASCII UTF-8 Unlike UTF-7, uses all 8 bits of the octets. US-ASCII
characters are encoded as before unchanged. Any octet with characters are encoded as before unchanged. Any octet with
the high bit cleared can only mean a US-ASCII character. the high bit cleared can only mean a US-ASCII character.
The high bit set means that a UCS character is being The high bit set means that a UCS character is being
encoded. encoded.
UTF-16 Encodes UCS-4 characters into UCS-2 characters using a UTF-16 Encodes UCS-4 characters into UCS-2 characters using a
reserved range in UCS-2. reserved range in UCS-2.
skipping to change at page 116, line 5 skipping to change at page 122, line 5
0000 0080-0000 07FF 110xxxxx 10xxxxxx 0000 0080-0000 07FF 110xxxxx 10xxxxxx
0000 0800-0000 FFFF 1110xxxx 10xxxxxx 10xxxxxx 0000 0800-0000 FFFF 1110xxxx 10xxxxxx 10xxxxxx
0001 0000-001F FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 0001 0000-001F FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
0020 0000-03FF FFFF 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 0020 0000-03FF FFFF 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
0400 0000-7FFF FFFF 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 0400 0000-7FFF FFFF 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
10xxxxxx 10xxxxxx
See [RFC2279] for precise encoding and decoding rules. Note because See [RFC2279] for precise encoding and decoding rules. Note because
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
of UTF-16, the algorithm from Unicode/UCS-2 to UTF-8 needs to account of UTF-16, the algorithm from Unicode/UCS-2 to UTF-8 needs to account
for the reserved range between D800 and DFFF. for the reserved range between D800 and DFFF.
Note that the 16 bit UCS or Unicode characters require no more than 3 Note that the 16 bit UCS or Unicode characters require no more than 3
octets to encode into UTF-8 octets to encode into UTF-8
Interestingly, UTF-8 has room to handle characters larger than 31 Interestingly, UTF-8 has room to handle characters larger than 31
bits, because the leading octet of form: bits, because the leading octet of form:
skipping to change at page 117, line 5 skipping to change at page 123, line 5
11.6. UTF-8 Related Errors 11.6. UTF-8 Related Errors
Where the client sends an invalid UTF-8 string, the server should Where the client sends an invalid UTF-8 string, the server should
return an NFS4ERR_INVAL error. This includes cases in which return an NFS4ERR_INVAL error. This includes cases in which
inappropriate prefixes are detected and where the count includes inappropriate prefixes are detected and where the count includes
trailing bytes that do not constitute a full UCS character. trailing bytes that do not constitute a full UCS character.
Where the client supplied string is valid UTF-8 but contains Where the client supplied string is valid UTF-8 but contains
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
characters that are not supported by the server as a value for that characters that are not supported by the server as a value for that
string (e.g. names containing characters that have more than two string (e.g. names containing characters that have more than two
octets on a filesystem that supports Unicode characters only), the octets on a filesystem that supports Unicode characters only), the
server should return an NFS4ERR_BADCHAR error. server should return an NFS4ERR_BADCHAR error.
Where a UTF-8 string is used as a file name, and the filesystem, Where a UTF-8 string is used as a file name, and the filesystem,
while supporting all of the characters within the name, does not while supporting all of the characters within the name, does not
allow that particular name to be used, the error should return the allow that particular name to be used, the error should return the
error NFS4ERR_BADNAME. This includes situations in which the server error NFS4ERR_BADNAME. This includes situations in which the server
filesystem imposes a normalization constraint on name strings, but filesystem imposes a normalization constraint on name strings, but
will also include such situations as filesystem prohibitions of "." will also include such situations as filesystem prohibitions of "."
and ".." as file names for certain operations, and other such and ".." as file names for certain operations, and other such
constraints. constraints.
Draft Specification NFS version 4 Protocol August 2002 Draft Specification NFS version 4 Protocol September 2002
12. Error Definitions 12. Error Definitions
NFS error numbers are assigned to failed operations within a compound NFS error numbers are assigned to failed operations within a compound
request. A compound request contains a number of NFS operations that request. A compound request contains a number of NFS operations that
have their results encoded in sequence in a compound reply. The have their results encoded in sequence in a compound reply. The
results of successful operations will consist of an NFS4_OK status