draft-ietf-nfsv4-rfc3010bis-05.txt   rfc3530.txt 
NFS version 4 Working Group S. Shepler Network Working Group S. Shepler
INTERNET-DRAFT Sun Microsystems, Inc. Request for Comments: 3530 B. Callaghan
Obsoletes: 3010 C. Beame Obsoletes: 3010 D. Robinson
Document: draft-ietf-nfsv4-rfc3010bis-05.txt Hummingbird Ltd. Category: Standards Track R. Thurlow
B. Callaghan
Sun Microsystems, Inc. Sun Microsystems, Inc.
C. Beame
Hummingbird Ltd.
M. Eisler M. Eisler
Network Appliance, Inc.
D. Noveck D. Noveck
Network Appliance, Inc. Network Appliance, Inc.
D. Robinson April 2003
Sun Microsystems, Inc.
R. Thurlow
Sun Microsystems, Inc.
November 2002
NFS version 4 Protocol Network File System (NFS) version 4 Protocol
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with This document specifies an Internet standards track protocol for the
all provisions of Section 10 of RFC2026. Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Internet-Drafts are working documents of the Internet Engineering Official Protocol Standards" (STD 1) for the standardization state
Task Force (IETF), its areas, and its working groups. Note that and status of this protocol. Distribution of this memo is unlimited.
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet- Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at Copyright Notice
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at Copyright (C) The Internet Society (2003). All Rights Reserved.
http://www.ietf.org/shadow.html.
Abstract Abstract
This document replaces [RFC3010] as the definition of the NFS version The Network File System (NFS) version 4 is a distributed filesystem
4 protocol. protocol which owes heritage to NFS protocol version 2, RFC 1094, and
version 3, RFC 1813. Unlike earlier versions, the NFS version 4
Draft Specification NFS version 4 Protocol November 2002 protocol supports traditional file access while integrating support
for file locking and the mount protocol. In addition, support for
NFS version 4 is a distributed filesystem protocol which owes strong security (and its negotiation), compound operations, client
heritage to NFS protocol versions 2 [RFC1094] and 3 [RFC1813]. caching, and internationalization have been added. Of course,
Unlike earlier versions, the NFS version 4 protocol supports attention has been applied to making NFS version 4 operate well in an
traditional file access while integrating support for file locking Internet environment.
and the mount protocol. In addition, support for strong security
(and its negotiation), compound operations, client caching, and
internationalization have been added. Of course, attention has been
applied to making NFS version 4 operate well in an Internet
environment.
Copyright
Copyright (C) The Internet Society (2000-2002). All Rights Reserved. This document replaces RFC 3010 as the definition of the NFS version
4 protocol.
Key Words Key Words
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119]. document are to be interpreted as described in [RFC2119].
Draft Specification NFS version 4 Protocol November 2002
Table of Contents Table of Contents
1. Changes since RFC3010 . . . . . . . . . . . . . . . . . . . . 8 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 8
1.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 9 1.1. Changes since RFC 3010 . . . . . . . . . . . . . . . 8
1.2. Inconsistencies of this Document with Section 18 . . . . . 9 1.2. NFS version 4 Goals. . . . . . . . . . . . . . . . . 9
1.3. Overview of NFS version 4 Features . . . . . . . . . . . 10 1.3. Inconsistencies of this Document with Section 18 . . 9
1.3.1. RPC and Security . . . . . . . . . . . . . . . . . . . 10 1.4. Overview of NFS version 4 Features . . . . . . . . . 10
1.3.2. Procedure and Operation Structure . . . . . . . . . . . 10 1.4.1. RPC and Security . . . . . . . . . . . . . . 10
1.3.3. Filesystem Model . . . . . . . . . . . . . . . . . . . 11 1.4.2. Procedure and Operation Structure. . . . . . 10
1.3.3.1. Filehandle Types . . . . . . . . . . . . . . . . . . 11 1.4.3. Filesystem Mode. . . . . . . . . . . . . . . 11
1.3.3.2. Attribute Types . . . . . . . . . . . . . . . . . . . 12 1.4.3.1. Filehandle Types . . . . . . . . . 11
1.3.3.3. Filesystem Replication and Migration . . . . . . . . 12 1.4.3.2. Attribute Types. . . . . . . . . . 12
1.3.4. OPEN and CLOSE . . . . . . . . . . . . . . . . . . . . 13 1.4.3.3. Filesystem Replication and
1.3.5. File locking . . . . . . . . . . . . . . . . . . . . . 13 Migration. . . . . . . . . . . . . 13
1.3.6. Client Caching and Delegation . . . . . . . . . . . . . 13 1.4.4. OPEN and CLOSE . . . . . . . . . . . . . . . 13
1.4. General Definitions . . . . . . . . . . . . . . . . . . . 14 1.4.5. File locking . . . . . . . . . . . . . . . . 13
1.4.6. Client Caching and Delegation. . . . . . . . 13
1.5. General Definitions. . . . . . . . . . . . . . . . . 14
2. Protocol Data Types . . . . . . . . . . . . . . . . . . . . 16 2. Protocol Data Types . . . . . . . . . . . . . . . . . . . . 16
2.1. Basic Data Types . . . . . . . . . . . . . . . . . . . . 16 2.1. Basic Data Types . . . . . . . . . . . . . . . . . . 16
2.2. Structured Data Types . . . . . . . . . . . . . . . . . . 17 2.2. Structured Data Types. . . . . . . . . . . . . . . . 18
3. RPC and Security Flavor . . . . . . . . . . . . . . . . . . 23 3. RPC and Security Flavor . . . . . . . . . . . . . . . . . . 23
3.1. Ports and Transports . . . . . . . . . . . . . . . . . . 23 3.1. Ports and Transports . . . . . . . . . . . . . . . . 23
3.1.1. Client Retransmission Behavior . . . . . . . . . . . . 24 3.1.1. Client Retransmission Behavior . . . . . . . 24
3.2. Security Flavors . . . . . . . . . . . . . . . . . . . . 24 3.2. Security Flavors . . . . . . . . . . . . . . . . . . 25
3.2.1. Security mechanisms for NFS version 4 . . . . . . . . . 24 3.2.1. Security mechanisms for NFS version 4. . . . 25
3.2.1.1. Kerberos V5 as a security triple . . . . . . . . . . 25 3.2.1.1. Kerberos V5 as a security triple . 25
3.2.1.2. LIPKEY as a security triple . . . . . . . . . . . . . 25 3.2.1.2. LIPKEY as a security triple. . . . 26
3.2.1.3. SPKM-3 as a security triple . . . . . . . . . . . . . 26 3.2.1.3. SPKM-3 as a security triple. . . . 27
3.3. Security Negotiation . . . . . . . . . . . . . . . . . . 27 3.3. Security Negotiation . . . . . . . . . . . . . . . . 27
3.3.1. SECINFO . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3.1. SECINFO. . . . . . . . . . . . . . . . . . . 28
3.3.2. Security Error . . . . . . . . . . . . . . . . . . . . 27 3.3.2. Security Error . . . . . . . . . . . . . . . 28
3.4. Callback RPC Authentication . . . . . . . . . . . . . . . 28 3.4. Callback RPC Authentication. . . . . . . . . . . . . 28
4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . . 30 4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . . 30
4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 30 4.1. Obtaining the First Filehandle . . . . . . . . . . . 30
4.1.1. Root Filehandle . . . . . . . . . . . . . . . . . . . . 30 4.1.1. Root Filehandle. . . . . . . . . . . . . . . 31
4.1.2. Public Filehandle . . . . . . . . . . . . . . . . . . . 30 4.1.2. Public Filehandle. . . . . . . . . . . . . . 31
4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 31 4.2. Filehandle Types . . . . . . . . . . . . . . . . . . 31
4.2.1. General Properties of a Filehandle . . . . . . . . . . 31 4.2.1. General Properties of a Filehandle . . . . . 32
4.2.2. Persistent Filehandle . . . . . . . . . . . . . . . . . 32 4.2.2. Persistent Filehandle. . . . . . . . . . . . 32
4.2.3. Volatile Filehandle . . . . . . . . . . . . . . . . . . 32 4.2.3. Volatile Filehandle. . . . . . . . . . . . . 33
4.2.4. One Method of Constructing a Volatile Filehandle . . . 33 4.2.4. One Method of Constructing a
4.3. Client Recovery from Filehandle Expiration . . . . . . . 34 Volatile Filehandle. . . . . . . . . . . . . 34
5. File Attributes . . . . . . . . . . . . . . . . . . . . . . 36 4.3. Client Recovery from Filehandle Expiration . . . . . 35
5.1. Mandatory Attributes . . . . . . . . . . . . . . . . . . 37 5. File Attributes. . . . . . . . . . . . . . . . . . . . . . 35
5.2. Recommended Attributes . . . . . . . . . . . . . . . . . 37 5.1. Mandatory Attributes . . . . . . . . . . . . . . . . 37
5.3. Named Attributes . . . . . . . . . . . . . . . . . . . . 37 5.2. Recommended Attributes . . . . . . . . . . . . . . . 37
5.4. Classification of Attributes . . . . . . . . . . . . . . 38 5.3. Named Attributes . . . . . . . . . . . . . . . . . . 37
5.5. Mandatory Attributes - Definitions . . . . . . . . . . . 40 5.4. Classification of Attributes . . . . . . . . . . . . 38
5.6. Recommended Attributes - Definitions . . . . . . . . . . 42 5.5. Mandatory Attributes - Definitions . . . . . . . . . 39
5.7. Time Access . . . . . . . . . . . . . . . . . . . . . . . 47 5.6. Recommended Attributes - Definitions . . . . . . . . 41
5.8. Interpreting owner and owner_group . . . . . . . . . . . 47 5.7. Time Access. . . . . . . . . . . . . . . . . . . . . 46
5.9. Character Case Attributes . . . . . . . . . . . . . . . . 49 5.8. Interpreting owner and owner_group . . . . . . . . . 47
5.10. Quota Attributes . . . . . . . . . . . . . . . . . . . . 49 5.9. Character Case Attributes. . . . . . . . . . . . . . 49
5.10. Quota Attributes . . . . . . . . . . . . . . . . . . 49
Draft Specification NFS version 4 Protocol November 2002 5.11. Access Control Lists . . . . . . . . . . . . . . . . 50
5.11.1. ACE type . . . . . . . . . . . . . . . . . 51
5.11. Access Control Lists . . . . . . . . . . . . . . . . . . 50 5.11.2. ACE Access Mask. . . . . . . . . . . . . . 52
5.11.1. ACE type . . . . . . . . . . . . . . . . . . . . . . . 51 5.11.3. ACE flag . . . . . . . . . . . . . . . . . 54
5.11.2. ACE Access Mask . . . . . . . . . . . . . . . . . . . 52 5.11.4. ACE who . . . . . . . . . . . . . . . . . 55
5.11.3. ACE flag . . . . . . . . . . . . . . . . . . . . . . . 54 5.11.5. Mode Attribute . . . . . . . . . . . . . . 56
5.11.4. ACE who . . . . . . . . . . . . . . . . . . . . . . . 56 5.11.6. Mode and ACL Attribute . . . . . . . . . . 57
5.11.5. Mode Attribute . . . . . . . . . . . . . . . . . . . . 56 5.11.7. mounted_on_fileid. . . . . . . . . . . . . 57
5.11.6. Mode and ACL Attribute . . . . . . . . . . . . . . . . 57 6. Filesystem Migration and Replication . . . . . . . . . . . 58
5.11.7. mounted_on_fileid . . . . . . . . . . . . . . . . . . 57 6.1. Replication. . . . . . . . . . . . . . . . . . . . . 58
6. Filesystem Migration and Replication . . . . . . . . . . . 59 6.2. Migration. . . . . . . . . . . . . . . . . . . . . . 59
6.1. Replication . . . . . . . . . . . . . . . . . . . . . . . 59 6.3. Interpretation of the fs_locations Attribute . . . . 60
6.2. Migration . . . . . . . . . . . . . . . . . . . . . . . . 59 6.4. Filehandle Recovery for Migration or Replication . . 61
6.3. Interpretation of the fs_locations Attribute . . . . . . 60 7. NFS Server Name Space . . . . . . . . . . . . . . . . . . . 61
6.4. Filehandle Recovery for Migration or Replication . . . . 61 7.1. Server Exports . . . . . . . . . . . . . . . . . . . 61
7. NFS Server Name Space . . . . . . . . . . . . . . . . . . . 62 7.2. Browsing Exports . . . . . . . . . . . . . . . . . . 62
7.1. Server Exports . . . . . . . . . . . . . . . . . . . . . 62 7.3. Server Pseudo Filesystem . . . . . . . . . . . . . . 62
7.2. Browsing Exports . . . . . . . . . . . . . . . . . . . . 62 7.4. Multiple Roots . . . . . . . . . . . . . . . . . . . 63
7.3. Server Pseudo Filesystem . . . . . . . . . . . . . . . . 62 7.5. Filehandle Volatility. . . . . . . . . . . . . . . . 63
7.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . 63 7.6. Exported Root. . . . . . . . . . . . . . . . . . . . 63
7.5. Filehandle Volatility . . . . . . . . . . . . . . . . . . 63 7.7. Mount Point Crossing . . . . . . . . . . . . . . . . 63
7.6. Exported Root . . . . . . . . . . . . . . . . . . . . . . 63 7.8. Security Policy and Name Space Presentation. . . . . 64
7.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . 64 8. File Locking and Share Reservations. . . . . . . . . . . . 65
7.8. Security Policy and Name Space Presentation . . . . . . . 64 8.1. Locking. . . . . . . . . . . . . . . . . . . . . . . 65
8. File Locking and Share Reservations . . . . . . . . . . . . 66 8.1.1. Client ID. . . . . . . . . . . . . . . . . 66
8.1. Locking . . . . . . . . . . . . . . . . . . . . . . . . . 66 8.1.2. Server Release of Clientid . . . . . . . . 69
8.1.1. Client ID . . . . . . . . . . . . . . . . . . . . . . . 66 8.1.3. lock_owner and stateid Definition. . . . . 69
8.1.2. Server Release of Clientid . . . . . . . . . . . . . . 69 8.1.4. Use of the stateid and Locking . . . . . . 71
8.1.3. lock_owner and stateid Definition . . . . . . . . . . . 70 8.1.5. Sequencing of Lock Requests. . . . . . . . 73
8.1.4. Use of the stateid and Locking . . . . . . . . . . . . 71 8.1.6. Recovery from Replayed Requests. . . . . . 74
8.1.5. Sequencing of Lock Requests . . . . . . . . . . . . . . 73 8.1.7. Releasing lock_owner State . . . . . . . . 74
8.1.6. Recovery from Replayed Requests . . . . . . . . . . . . 74 8.1.8. Use of Open Confirmation . . . . . . . . . 75
8.1.7. Releasing lock_owner State . . . . . . . . . . . . . . 75 8.2. Lock Ranges. . . . . . . . . . . . . . . . . . . . . 76
8.1.8. Use of Open Confirmation . . . . . . . . . . . . . . . 75 8.3. Upgrading and Downgrading Locks. . . . . . . . . . . 76
8.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . . 76 8.4. Blocking Locks . . . . . . . . . . . . . . . . . . . 77
8.3. Upgrading and Downgrading Locks . . . . . . . . . . . . . 76 8.5. Lease Renewal. . . . . . . . . . . . . . . . . . . . 77
8.4. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 77 8.6. Crash Recovery . . . . . . . . . . . . . . . . . . . 78
8.5. Lease Renewal . . . . . . . . . . . . . . . . . . . . . . 77 8.6.1. Client Failure and Recovery. . . . . . . . 79
8.6. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 78 8.6.2. Server Failure and Recovery. . . . . . . . 79
8.6.1. Client Failure and Recovery . . . . . . . . . . . . . . 79 8.6.3. Network Partitions and Recovery. . . . . . 81
8.6.2. Server Failure and Recovery . . . . . . . . . . . . . . 79 8.7. Recovery from a Lock Request Timeout or Abort . . . 85
8.6.3. Network Partitions and Recovery . . . . . . . . . . . . 81 8.8. Server Revocation of Locks. . . . . . . . . . . . . 85
8.7. Recovery from a Lock Request Timeout or Abort . . . . . . 84 8.9. Share Reservations. . . . . . . . . . . . . . . . . 86
8.8. Server Revocation of Locks . . . . . . . . . . . . . . . 85 8.10. OPEN/CLOSE Operations . . . . . . . . . . . . . . . 87
8.9. Share Reservations . . . . . . . . . . . . . . . . . . . 86 8.10.1. Close and Retention of State
8.10. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . 86 Information. . . . . . . . . . . . . . . . 88
8.10.1. Close and Retention of State Information . . . . . . . 87 8.11. Open Upgrade and Downgrade. . . . . . . . . . . . . 88
8.11. Open Upgrade and Downgrade . . . . . . . . . . . . . . . 88 8.12. Short and Long Leases . . . . . . . . . . . . . . . 89
8.12. Short and Long Leases . . . . . . . . . . . . . . . . . 88
8.13. Clocks, Propagation Delay, and Calculating Lease 8.13. Clocks, Propagation Delay, and Calculating Lease
Expiration . . . . . . . . . . . . . . . . . . . . . . . 89 Expiration. . . . . . . . . . . . . . . . . . . . . 89
8.14. Migration, Replication and State . . . . . . . . . . . . 89 8.14. Migration, Replication and State. . . . . . . . . . 90
8.14.1. Migration and State . . . . . . . . . . . . . . . . . 90 8.14.1. Migration and State. . . . . . . . . . . . 90
8.14.2. Replication and State . . . . . . . . . . . . . . . . 90 8.14.2. Replication and State. . . . . . . . . . . 91
8.14.3. Notification of Migrated Lease . . . . . . 92
Draft Specification NFS version 4 Protocol November 2002 8.14.4. Migration and the Lease_time Attribute . . 92
8.14.3. Notification of Migrated Lease . . . . . . . . . . . . 91
8.14.4. Migration and the Lease_time Attribute . . . . . . . . 91
9. Client-Side Caching . . . . . . . . . . . . . . . . . . . . 93 9. Client-Side Caching . . . . . . . . . . . . . . . . . . . . 93
9.1. Performance Challenges for Client-Side Caching . . . . . 93 9.1. Performance Challenges for Client-Side Caching. . . 93
9.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 94 9.2. Delegation and Callbacks. . . . . . . . . . . . . . 94
9.2.1. Delegation Recovery . . . . . . . . . . . . . . . . . . 95 9.2.1. Delegation Recovery . . . . . . . . . . . . 96
9.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 97 9.3. Data Caching. . . . . . . . . . . . . . . . . . . . 98
9.3.1. Data Caching and OPENs . . . . . . . . . . . . . . . . 97 9.3.1. Data Caching and OPENs . . . . . . . . . . 98
9.3.2. Data Caching and File Locking . . . . . . . . . . . . . 98 9.3.2. Data Caching and File Locking. . . . . . . 99
9.3.3. Data Caching and Mandatory File Locking . . . . . . . . 100 9.3.3. Data Caching and Mandatory File Locking. . 101
9.3.4. Data Caching and File Identity . . . . . . . . . . . . 100 9.3.4. Data Caching and File Identity . . . . . . 101
9.4. Open Delegation . . . . . . . . . . . . . . . . . . . . . 101 9.4. Open Delegation . . . . . . . . . . . . . . . . . . 102
9.4.1. Open Delegation and Data Caching . . . . . . . . . . . 104 9.4.1. Open Delegation and Data Caching . . . . . 104
9.4.2. Open Delegation and File Locks . . . . . . . . . . . . 105 9.4.2. Open Delegation and File Locks . . . . . . 106
9.4.3. Handling of CB_GETATTR . . . . . . . . . . . . . . . . 105 9.4.3. Handling of CB_GETATTR . . . . . . . . . . 106
9.4.4. Recall of Open Delegation . . . . . . . . . . . . . . . 108 9.4.4. Recall of Open Delegation. . . . . . . . . 109
9.4.5. Clients that Fail to Honor Delegation Recalls . . . . . 110 9.4.5. Clients that Fail to Honor
9.4.6. Delegation Revocation . . . . . . . . . . . . . . . . . 110 Delegation Recalls . . . . . . . . . . . . 111
9.5. Data Caching and Revocation . . . . . . . . . . . . . . . 111 9.4.6. Delegation Revocation. . . . . . . . . . . 112
9.5.1. Revocation Recovery for Write Open Delegation . . . . . 111 9.5. Data Caching and Revocation . . . . . . . . . . . . 112
9.6. Attribute Caching . . . . . . . . . . . . . . . . . . . . 112 9.5.1. Revocation Recovery for Write Open
9.7. Data and Metadata Caching and Memory Mapped Files . . . . 114 Delegation . . . . . . . . . . . . . . . . 113
9.8. Name Caching . . . . . . . . . . . . . . . . . . . . . . 116 9.6. Attribute Caching . . . . . . . . . . . . . . . . . 113
9.9. Directory Caching . . . . . . . . . . . . . . . . . . . . 117 9.7. Data and Metadata Caching and Memory Mapped Files . 115
10. Minor Versioning . . . . . . . . . . . . . . . . . . . . . 119 9.8. Name Caching . . . . . . . . . . . . . . . . . . . 118
9.9. Directory Caching . . . . . . . . . . . . . . . . . 119
10. Minor Versioning . . . . . . . . . . . . . . . . . . . . . 120
11. Internationalization . . . . . . . . . . . . . . . . . . . 122 11. Internationalization . . . . . . . . . . . . . . . . . . . 122
11.1. Stringprep profile for the utf8str_cs type . . . . . . . 123 11.1. Stringprep profile for the utf8str_cs type. . . . . 123
11.1.1. Intended applicability of the nfs4_cs_prep profile . . 123 11.1.1. Intended applicability of the
11.1.2. Character repertoire of nfs4_cs_prep . . . . . . . . . 123 nfs4_cs_prep profile . . . . . . . . . . . 123
11.1.3. Mapping used by nfs4_cs_prep . . . . . . . . . . . . . 123 11.1.2. Character repertoire of nfs4_cs_prep . . . 124
11.1.4. Normalization used by nfs4_cs_prep . . . . . . . . . . 124 11.1.3. Mapping used by nfs4_cs_prep . . . . . . . 124
11.1.5. Prohibited output for nfs4_cs_prep . . . . . . . . . . 124 11.1.4. Normalization used by nfs4_cs_prep . . . . 124
11.1.6. Bidirectional output for nfs4_cs_prep . . . . . . . . 124 11.1.5. Prohibited output for nfs4_cs_prep . . . . 125
11.2. Stringprep profile for the utf8str_cis type . . . . . . 124 11.1.6. Bidirectional output for nfs4_cs_prep. . . 125
11.2.1. Intended applicability of the nfs4_cis_prep profile . 124 11.2. Stringprep profile for the utf8str_cis type . . . . 125
11.2.2. Character repertoire of nfs4_cis_prep . . . . . . . . 124 11.2.1. Intended applicability of the
11.2.3. Mapping used by nfs4_cis_prep . . . . . . . . . . . . 124 nfs4_cis_prep profile. . . . . . . . . . . 125
11.2.4. Normalization used by nfs4_cis_prep . . . . . . . . . 125 11.2.2. Character repertoire of nfs4_cis_prep . . 125
11.2.5. Prohibited output for nfs4_cis_prep . . . . . . . . . 125 11.2.3. Mapping used by nfs4_cis_prep . . . . . . 125
11.2.6. Bidirectional output for nfs4_cis_prep . . . . . . . . 125 11.2.4. Normalization used by nfs4_cis_prep . . . 125
11.3. Stringprep profile for the utf8str_mixed type . . . . . 125 11.2.5. Prohibited output for nfs4_cis_prep . . . 126
11.3.1. Intended applicability of the nfs4_mixed_prep profile 125 11.2.6. Bidirectional output for nfs4_cis_prep . . 126
11.3.2. Character repertoire of nfs4_mixed_prep . . . . . . . 125 11.3. Stringprep profile for the utf8str_mixed type . . . 126
11.3.3. Mapping used by nfs4_cis_prep . . . . . . . . . . . . 125 11.3.1. Intended applicability of the
11.3.4. Normalization used by nfs4_mixed_prep . . . . . . . . 126 nfs4_mixed_prep profile. . . . . . . . . . 126
11.3.5. Prohibited output for nfs4_mixed_prep . . . . . . . . 126 11.3.2. Character repertoire of nfs4_mixed_prep . 126
11.3.6. Bidirectional output for nfs4_mixed_prep . . . . . . . 126 11.3.3. Mapping used by nfs4_cis_prep . . . . . . 126
11.4. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 126 11.3.4. Normalization used by nfs4_mixed_prep . . 127
12. Error Definitions . . . . . . . . . . . . . . . . . . . . 127 11.3.5. Prohibited output for nfs4_mixed_prep . . 127
13. NFS version 4 Requests . . . . . . . . . . . . . . . . . . 133 11.3.6. Bidirectional output for nfs4_mixed_prep . 127
13.1. Compound Procedure . . . . . . . . . . . . . . . . . . . 133 11.4. UTF-8 Related Errors. . . . . . . . . . . . . . . . 127
13.2. Evaluation of a Compound Request . . . . . . . . . . . . 134 12. Error Definitions . . . . . . . . . . . . . . . . . . . . 128
13. NFS version 4 Requests . . . . . . . . . . . . . . . . . . 134
Draft Specification NFS version 4 Protocol November 2002 13.1. Compound Procedure. . . . . . . . . . . . . . . . . 134
13.2. Evaluation of a Compound Request. . . . . . . . . . 135
13.3. Synchronous Modifying Operations . . . . . . . . . . . . 134 13.3. Synchronous Modifying Operations. . . . . . . . . . 136
13.4. Operation Values . . . . . . . . . . . . . . . . . . . . 135 13.4. Operation Values. . . . . . . . . . . . . . . . . . 136
14. NFS version 4 Procedures . . . . . . . . . . . . . . . . . 136 14. NFS version 4 Procedures . . . . . . . . . . . . . . . . . 136
14.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 136 14.1. Procedure 0: NULL - No Operation. . . . . . . . . . 136
14.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 137 14.2. Procedure 1: COMPOUND - Compound Operations . . . . 137
14.2.1. Operation 3: ACCESS - Check Access Rights . . . . . . 140 14.2.1. Operation 3: ACCESS - Check Access
14.2.2. Operation 4: CLOSE - Close File . . . . . . . . . . . 143 Rights. . . . . . . . . . . . . . . . . . 140
14.2.3. Operation 5: COMMIT - Commit Cached Data . . . . . . . 145 14.2.2. Operation 4: CLOSE - Close File . . . . . 142
14.2.4. Operation 6: CREATE - Create a Non-Regular File Object 148 14.2.3. Operation 5: COMMIT - Commit
14.2.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting Cached Data . . . . . . . . . . . . . . . 144
Recovery . . . . . . . . . . . . . . . . . . . . . . . 151 14.2.4. Operation 6: CREATE - Create a
14.2.6. Operation 8: DELEGRETURN - Return Delegation . . . . . 153 Non-Regular File Object . . . . . . . . . 147
14.2.7. Operation 9: GETATTR - Get Attributes . . . . . . . . 154 14.2.5. Operation 7: DELEGPURGE -
14.2.8. Operation 10: GETFH - Get Current Filehandle . . . . . 156 Purge Delegations Awaiting Recovery . . . 150
14.2.9. Operation 11: LINK - Create Link to a File . . . . . . 158 14.2.6. Operation 8: DELEGRETURN - Return
14.2.10. Operation 12: LOCK - Create Lock . . . . . . . . . . 160 Delegation. . . . . . . . . . . . . . . . 151
14.2.11. Operation 13: LOCKT - Test For Lock . . . . . . . . . 164 14.2.7. Operation 9: GETATTR - Get Attributes . . 152
14.2.12. Operation 14: LOCKU - Unlock File . . . . . . . . . . 166 14.2.8. Operation 10: GETFH - Get Current
14.2.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . 168 Filehandle. . . . . . . . . . . . . . . . 153
14.2.14. Operation 16: LOOKUPP - Lookup Parent Directory . . . 171 14.2.9. Operation 11: LINK - Create Link to a
14.2.15. Operation 17: NVERIFY - Verify Difference in File. . . . . . . . . . . . . . . . . . . 154
Attributes . . . . . . . . . . . . . . . . . . . . . 172 14.2.10. Operation 12: LOCK - Create Lock . . . . 156
14.2.16. Operation 18: OPEN - Open a Regular File . . . . . . 174 14.2.11. Operation 13: LOCKT - Test For Lock . . . 160
14.2.17. Operation 19: OPENATTR - Open Named Attribute 14.2.12. Operation 14: LOCKU - Unlock File . . . . 162
Directory . . . . . . . . . . . . . . . . . . . . . . 184 14.2.13. Operation 15: LOOKUP - Lookup Filename. . 163
14.2.18. Operation 20: OPEN_CONFIRM - Confirm Open . . . . . . 186 14.2.14. Operation 16: LOOKUPP - Lookup
14.2.19. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access 189 Parent Directory. . . . . . . . . . . . . 165
14.2.20. Operation 22: PUTFH - Set Current Filehandle . . . . 191 14.2.15. Operation 17: NVERIFY - Verify
14.2.21. Operation 23: PUTPUBFH - Set Public Filehandle . . . 192 Difference in Attributes . . . . . . . . 166
14.2.22. Operation 24: PUTROOTFH - Set Root Filehandle . . . . 194 14.2.16. Operation 18: OPEN - Open a Regular
14.2.23. Operation 25: READ - Read from File . . . . . . . . . 195 File. . . . . . . . . . . . . . . . . . . 168
14.2.24. Operation 26: READDIR - Read Directory . . . . . . . 198 14.2.17. Operation 19: OPENATTR - Open Named
14.2.25. Operation 27: READLINK - Read Symbolic Link . . . . . 202 Attribute Directory . . . . . . . . . . . 178
14.2.26. Operation 28: REMOVE - Remove Filesystem Object . . . 204 14.2.18. Operation 20: OPEN_CONFIRM -
14.2.27. Operation 29: RENAME - Rename Directory Entry . . . . 207 Confirm Open . . . . . . . . . . . . . . 180
14.2.28. Operation 30: RENEW - Renew a Lease . . . . . . . . . 210 14.2.19. Operation 21: OPEN_DOWNGRADE -
14.2.29. Operation 31: RESTOREFH - Restore Saved Filehandle . 212 Reduce Open File Access . . . . . . . . . 182
14.2.30. Operation 32: SAVEFH - Save Current Filehandle . . . 214 14.2.20. Operation 22: PUTFH - Set
14.2.31. Operation 33: SECINFO - Obtain Available Security . . 215 Current Filehandle. . . . . . . . . . . . 184
14.2.32. Operation 34: SETATTR - Set Attributes . . . . . . . 219 14.2.21. Operation 23: PUTPUBFH -
14.2.33. Operation 35: SETCLIENTID - Negotiate Clientid . . . 222 Set Public Filehandle . . . . . . . . . . 185
14.2.34. Operation 36: SETCLIENTID_CONFIRM - Confirm Clientid 226 14.2.22. Operation 24: PUTROOTFH -
14.2.35. Operation 37: VERIFY - Verify Same Attributes . . . . 230 Set Root Filehandle . . . . . . . . . . . 186
14.2.36. Operation 38: WRITE - Write to File . . . . . . . . . 232 14.2.23. Operation 25: READ - Read from File . . . 187
14.2.37. Operation 39: RELEASE_LOCKOWNER - Release Lockowner 14.2.24. Operation 26: READDIR -
State . . . . . . . . . . . . . . . . . . . . . . . . 237 Read Directory. . . . . . . . . . . . . . 190
14.2.38. Operation 10044: ILLEGAL - Illegal operation . . . . 239 14.2.25. Operation 27: READLINK -
15. NFS version 4 Callback Procedures . . . . . . . . . . . . 240 Read Symbolic Link. . . . . . . . . . . . 193
15.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 240 14.2.26. Operation 28: REMOVE -
15.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 241 Remove Filesystem Object. . . . . . . . . 195
15.2.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . 243 14.2.27. Operation 29: RENAME -
15.2.2. Operation 4: CB_RECALL - Recall an Open Delegation . . 245 Rename Directory Entry. . . . . . . . . . 197
14.2.28. Operation 30: RENEW - Renew a Lease . . . 200
Draft Specification NFS version 4 Protocol November 2002 14.2.29. Operation 31: RESTOREFH -
Restore Saved Filehandle. . . . . . . . . 201
15.2.3. Operation 10044: CB_ILLEGAL - Illegal Callback 14.2.30. Operation 32: SAVEFH - Save
Operation . . . . . . . . . . . . . . . . . . . . . . 247 Current Filehandle. . . . . . . . . . . . 202
16. Security Considerations . . . . . . . . . . . . . . . . . 248 14.2.31. Operation 33: SECINFO - Obtain
17. IANA Considerations . . . . . . . . . . . . . . . . . . . 250 Available Security. . . . . . . . . . . . 203
17.1. Named Attribute Definition . . . . . . . . . . . . . . . 250 14.2.32. Operation 34: SETATTR - Set Attributes. . 206
17.2. ONC RPC Network Identifiers (netids) . . . . . . . . . . 250 14.2.33. Operation 35: SETCLIENTID -
18. RPC definition file . . . . . . . . . . . . . . . . . . . 252 Negotiate Clientid. . . . . . . . . . . . 209
19. Normative References . . . . . . . . . . . . . . . . . . . 284 14.2.34. Operation 36: SETCLIENTID_CONFIRM -
20. Informative References . . . . . . . . . . . . . . . . . . 285 Confirm Clientid. . . . . . . . . . . . . 213
21. Authors . . . . . . . . . . . . . . . . . . . . . . . . . 289 14.2.35. Operation 37: VERIFY -
21.1. Editor's Address . . . . . . . . . . . . . . . . . . . . 289 Verify Same Attributes. . . . . . . . . . 217
21.2. Authors' Addresses . . . . . . . . . . . . . . . . . . . 289 14.2.36. Operation 38: WRITE - Write to File . . . 218
21.3. Acknowledgements . . . . . . . . . . . . . . . . . . . . 290 14.2.37. Operation 39: RELEASE_LOCKOWNER -
22. Full Copyright Statement . . . . . . . . . . . . . . . . . 291 Release Lockowner State . . . . . . . . . 223
14.2.38. Operation 10044: ILLEGAL -
Illegal operation . . . . . . . . . . . . 224
15. NFS version 4 Callback Procedures . . . . . . . . . . . . 225
15.1. Procedure 0: CB_NULL - No Operation . . . . . . . . 225
15.2. Procedure 1: CB_COMPOUND - Compound
Operations. . . . . . . . . . . . . . . . . . . . . 226
15.2.1. Operation 3: CB_GETATTR - Get
Attributes . . . . . . . . . . . . . . . . 228
15.2.2. Operation 4: CB_RECALL -
Recall an Open Delegation. . . . . . . . . 229
15.2.3. Operation 10044: CB_ILLEGAL -
Illegal Callback Operation . . . . . . . . 230
16. Security Considerations . . . . . . . . . . . . . . . . . 231
17. IANA Considerations . . . . . . . . . . . . . . . . . . . 232
17.1. Named Attribute Definition. . . . . . . . . . . . . 232
17.2. ONC RPC Network Identifiers (netids). . . . . . . . 232
18. RPC definition file . . . . . . . . . . . . . . . . . . . 234
19. Acknowledgements . . . . . . . . . . . . . . . . . . . . . 268
20. Normative References . . . . . . . . . . . . . . . . . . . 268
21. Informative References . . . . . . . . . . . . . . . . . . 270
22. Authors' Information . . . . . . . . . . . . . . . . . . . 273
22.1. Editor's Address. . . . . . . . . . . . . . . . . . 273
22.2. Authors' Addresses. . . . . . . . . . . . . . . . . 274
23. Full Copyright Statement . . . . . . . . . . . . . . . . . 275
Draft Specification NFS version 4 Protocol November 2002 1. Introduction
1. Changes since RFC3010 1.1. Changes since RFC 3010
This definition of the NFS version 4 protocol replaces or obsoletes This definition of the NFS version 4 protocol replaces or obsoletes
the definition present in [RFC3010]. While portions of the two the definition present in [RFC3010]. While portions of the two
documents have remained the same, there have been substantive changes documents have remained the same, there have been substantive changes
in others. The changes made between [RFC3010] and this document in others. The changes made between [RFC3010] and this document
represent implementation experience and further review of the represent implementation experience and further review of the
protocol. While some modifications were made for ease of protocol. While some modifications were made for ease of
implementation or clarification, most updates represent errors or implementation or clarification, most updates represent errors or
situations where the [RFC3010] definition were untenable. situations where the [RFC3010] definition were untenable.
The following list is not all inclusive of all changes but presents The following list is not all inclusive of all changes but presents
some of the most notable changes or additions made: some of the most notable changes or additions made:
o The state model has added an open_owner4 identifier. This was o The state model has added an open_owner4 identifier. This was
done to accommodate Posix based clients and the model they use done to accommodate Posix based clients and the model they use for
for file locking. For Posix clients, an open_owner4 would file locking. For Posix clients, an open_owner4 would correspond
correspond to a file descriptor potentially shared amongst a set to a file descriptor potentially shared amongst a set of processes
of processes and the lock_owner4 identifier would correspond to and the lock_owner4 identifier would correspond to a process that
a process that is locking a file. is locking a file.
o Clarifications and error conditions were added for the handling o Clarifications and error conditions were added for the handling of
of the owner and group attributes. Since these attributes are the owner and group attributes. Since these attributes are string
string based (as opposed to the numeric uid/gid of previous based (as opposed to the numeric uid/gid of previous versions of
versions of NFS), translations may not be available and hence NFS), translations may not be available and hence the changes
the changes made. made.
o Clarifications for the ACL and mode attributes to address o Clarifications for the ACL and mode attributes to address
evaluation and partial support. evaluation and partial support.
o For identifiers that are defined as XDR opaque, limits were set o For identifiers that are defined as XDR opaque, limits were set on
on their size. their size.
o Added the mounted_on_filed attribute to allow Posix clients to o Added the mounted_on_filed attribute to allow Posix clients to
correctly construct local mounts. correctly construct local mounts.
o Modified the SETCLIENTID/SETCLIENTID_CONFIRM operations to deal o Modified the SETCLIENTID/SETCLIENTID_CONFIRM operations to deal
correctly with confirmation details along with adding the correctly with confirmation details along with adding the ability
ability to specify new client callback information. Also added to specify new client callback information. Also added
clarification of the callback information itself. clarification of the callback information itself.
o Added a new operation LOCKOWNER_RELEASE to enable notifying the o Added a new operation LOCKOWNER_RELEASE to enable notifying the
server that a lock_owner4 will no longer be used by the client. server that a lock_owner4 will no longer be used by the client.
o RENEW operation changes to identify the client correctly and o RENEW operation changes to identify the client correctly and allow
allow for additional error returns. for additional error returns.
o Verify error return possibilities for all operations. o Verify error return possibilities for all operations.
o Remove use of the pathname4 data type from LOOKUP and OPEN in o Remove use of the pathname4 data type from LOOKUP and OPEN in
favor of having the client construct a sequence of LOOKUP favor of having the client construct a sequence of LOOKUP
operations to achieive the same effect.
Draft Specification NFS version 4 Protocol November 2002
operations to acheive the same effect.
o Clarification of the internationalization issues and adoption of o Clarification of the internationalization issues and adoption of
the new stringprep profile framework. the new stringprep profile framework.
1.1. Introduction 1.2. NFS Version 4 Goals
The NFS version 4 protocol is a further revision of the NFS protocol The NFS version 4 protocol is a further revision of the NFS protocol
defined already by versions 2 [RFC1094] and 3 [RFC1813]. It retains defined already by versions 2 [RFC1094] and 3 [RFC1813]. It retains
the essential characteristics of previous versions: design for easy the essential characteristics of previous versions: design for easy
recovery, independent of transport protocols, operating systems and recovery, independent of transport protocols, operating systems and
filesystems, simplicity, and good performance. The NFS version 4 filesystems, simplicity, and good performance. The NFS version 4
revision has the following goals: revision has the following goals:
o Improved access and good performance on the Internet. o Improved access and good performance on the Internet.
The protocol is designed to transit firewalls easily, perform The protocol is designed to transit firewalls easily, perform well
well where latency is high and bandwidth is low, and scale to where latency is high and bandwidth is low, and scale to very
very large numbers of clients per server. large numbers of clients per server.
o Strong security with negotiation built into the protocol. o Strong security with negotiation built into the protocol.
The protocol builds on the work of the ONCRPC working group in The protocol builds on the work of the ONCRPC working group in
supporting the RPCSEC_GSS protocol. Additionally, the NFS supporting the RPCSEC_GSS protocol. Additionally, the NFS version
version 4 protocol provides a mechanism to allow clients and 4 protocol provides a mechanism to allow clients and servers the
servers the ability to negotiate security and require clients ability to negotiate security and require clients and servers to
and servers to support a minimal set of security schemes. support a minimal set of security schemes.
o Good cross-platform interoperability. o Good cross-platform interoperability.
The protocol features a filesystem model that provides a useful, The protocol features a filesystem model that provides a useful,
common set of features that does not unduly favor one filesystem common set of features that does not unduly favor one filesystem
or operating system over another. or operating system over another.
o Designed for protocol extensions. o Designed for protocol extensions.
The protocol is designed to accept standard extensions that do The protocol is designed to accept standard extensions that do not
not compromise backward compatibility. compromise backward compatibility.
1.2. Inconsistencies of this Document with Section 18 1.3. Inconsistencies of this Document with Section 18
Section 18, RPC Definition File, contains the definitions in XDR Section 18, RPC Definition File, contains the definitions in XDR
description language of the constructs used by the protocol. Prior description language of the constructs used by the protocol. Prior
to Section 18, several of the constructs are reproduced for purposes to Section 18, several of the constructs are reproduced for purposes
of explanation. The reader is warned of the possibility of errors in of explanation. The reader is warned of the possibility of errors in
the reproduced constructs outside of Section 18. For any part of the the reproduced constructs outside of Section 18. For any part of the
Draft Specification NFS version 4 Protocol November 2002
document that is inconsistent with Section 18, Section 18 is to be document that is inconsistent with Section 18, Section 18 is to be
considered authoritative. considered authoritative.
1.3. Overview of NFS version 4 Features 1.4. Overview of NFS version 4 Features
To provide a reasonable context for the reader, the major features of To provide a reasonable context for the reader, the major features of
NFS version 4 protocol will be reviewed in brief. This will be done NFS version 4 protocol will be reviewed in brief. This will be done
to provide an appropriate context for both the reader who is familiar to provide an appropriate context for both the reader who is familiar
with the previous versions of the NFS protocol and the reader that is with the previous versions of the NFS protocol and the reader that is
new to the NFS protocols. For the reader new to the NFS protocols, new to the NFS protocols. For the reader new to the NFS protocols,
there is still a fundamental knowledge that is expected. The reader there is still a fundamental knowledge that is expected. The reader
should be familiar with the XDR and RPC protocols as described in should be familiar with the XDR and RPC protocols as described in
[RFC1831] and [RFC1832]. A basic knowledge of filesystems and [RFC1831] and [RFC1832]. A basic knowledge of filesystems and
distributed filesystems is expected as well. distributed filesystems is expected as well.
1.3.1. RPC and Security 1.4.1. RPC and Security
As with previous versions of NFS, the External Data Representation As with previous versions of NFS, the External Data Representation
(XDR) and Remote Procedure Call (RPC) mechanisms used for the NFS (XDR) and Remote Procedure Call (RPC) mechanisms used for the NFS
version 4 protocol are those defined in [RFC1831] and [RFC1832]. To version 4 protocol are those defined in [RFC1831] and [RFC1832]. To
meet end to end security requirements, the RPCSEC_GSS framework meet end to end security requirements, the RPCSEC_GSS framework
[RFC2203] will be used to extend the basic RPC security. With the [RFC2203] will be used to extend the basic RPC security. With the
use of RPCSEC_GSS, various mechanisms can be provided to offer use of RPCSEC_GSS, various mechanisms can be provided to offer
authentication, integrity, and privacy to the NFS version 4 protocol. authentication, integrity, and privacy to the NFS version 4 protocol.
Kerberos V5 will be used as described in [RFC1964] to provide one Kerberos V5 will be used as described in [RFC1964] to provide one
security framework. The LIPKEY GSS-API mechanism described in security framework. The LIPKEY GSS-API mechanism described in
skipping to change at page 10, line 46 skipping to change at page 10, line 45
version 4 security. version 4 security.
To enable in-band security negotiation, the NFS version 4 protocol To enable in-band security negotiation, the NFS version 4 protocol
has added a new operation which provides the client a method of has added a new operation which provides the client a method of
querying the server about its policies regarding which security querying the server about its policies regarding which security
mechanisms must be used for access to the server's filesystem mechanisms must be used for access to the server's filesystem
resources. With this, the client can securely match the security resources. With this, the client can securely match the security
mechanism that meets the policies specified at both the client and mechanism that meets the policies specified at both the client and
server. server.
1.3.2. Procedure and Operation Structure 1.4.2. Procedure and Operation Structure
A significant departure from the previous versions of the NFS A significant departure from the previous versions of the NFS
protocol is the introduction of the COMPOUND procedure. For the NFS protocol is the introduction of the COMPOUND procedure. For the NFS
version 4 protocol, there are two RPC procedures, NULL and COMPOUND. version 4 protocol, there are two RPC procedures, NULL and COMPOUND.
The COMPOUND procedure is defined in terms of operations and these The COMPOUND procedure is defined in terms of operations and these
operations correspond more closely to the traditional NFS procedures. operations correspond more closely to the traditional NFS procedures.
With the use of the COMPOUND procedure, the client is able to build With the use of the COMPOUND procedure, the client is able to build
simple or complex requests. These COMPOUND requests allow for a simple or complex requests. These COMPOUND requests allow for a
reduction in the number of RPCs needed for logical filesystem reduction in the number of RPCs needed for logical filesystem
Draft Specification NFS version 4 Protocol November 2002
operations. For example, without previous contact with a server a operations. For example, without previous contact with a server a
client will be able to read data from a file in one request by client will be able to read data from a file in one request by
combining LOOKUP, OPEN, and READ operations in a single COMPOUND RPC. combining LOOKUP, OPEN, and READ operations in a single COMPOUND RPC.
With previous versions of the NFS protocol, this type of single With previous versions of the NFS protocol, this type of single
request was not possible. request was not possible.
The model used for COMPOUND is very simple. There is no logical OR The model used for COMPOUND is very simple. There is no logical OR
or ANDing of operations. The operations combined within a COMPOUND or ANDing of operations. The operations combined within a COMPOUND
request are evaluated in order by the server. Once an operation request are evaluated in order by the server. Once an operation
returns a failing result, the evaluation ends and the results of all returns a failing result, the evaluation ends and the results of all
skipping to change at page 11, line 29 skipping to change at page 11, line 30
The NFS version 4 protocol continues to have the client refer to a The NFS version 4 protocol continues to have the client refer to a
file or directory at the server by a "filehandle". The COMPOUND file or directory at the server by a "filehandle". The COMPOUND
procedure has a method of passing a filehandle from one operation to procedure has a method of passing a filehandle from one operation to
another within the sequence of operations. There is a concept of a another within the sequence of operations. There is a concept of a
"current filehandle" and "saved filehandle". Most operations use the "current filehandle" and "saved filehandle". Most operations use the
"current filehandle" as the filesystem object to operate upon. The "current filehandle" as the filesystem object to operate upon. The
"saved filehandle" is used as temporary filehandle storage within a "saved filehandle" is used as temporary filehandle storage within a
COMPOUND procedure as well as an additional operand for certain COMPOUND procedure as well as an additional operand for certain
operations. operations.
1.3.3. Filesystem Model 1.4.3. Filesystem Model
The general filesystem model used for the NFS version 4 protocol is The general filesystem model used for the NFS version 4 protocol is
the same as previous versions. The server filesystem is hierarchical the same as previous versions. The server filesystem is hierarchical
with the regular files contained within being treated as opaque byte with the regular files contained within being treated as opaque byte
streams. In a slight departure, file and directory names are encoded streams. In a slight departure, file and directory names are encoded
with UTF-8 to deal with the basics of internationalization. with UTF-8 to deal with the basics of internationalization.
The NFS version 4 protocol does not require a separate protocol to The NFS version 4 protocol does not require a separate protocol to
provide for the initial mapping between path name and filehandle. provide for the initial mapping between path name and filehandle.
Instead of using the older MOUNT protocol for this mapping, the Instead of using the older MOUNT protocol for this mapping, the
server provides a ROOT filehandle that represents the logical root or server provides a ROOT filehandle that represents the logical root or
top of the filesystem tree provided by the server. The server top of the filesystem tree provided by the server. The server
provides multiple filesystems by gluing them together with pseudo provides multiple filesystems by gluing them together with pseudo
filesystems. These pseudo filesystems provide for potential gaps in filesystems. These pseudo filesystems provide for potential gaps in
the path names between real filesystems. the path names between real filesystems.
1.3.3.1. Filehandle Types 1.4.3.1. Filehandle Types
In previous versions of the NFS protocol, the filehandle provided by In previous versions of the NFS protocol, the filehandle provided by
the server was guaranteed to be valid or persistent for the lifetime the server was guaranteed to be valid or persistent for the lifetime
of the filesystem object to which it referred. For some server of the filesystem object to which it referred. For some server
implementations, this persistence requirement has been difficult to implementations, this persistence requirement has been difficult to
meet. For the NFS version 4 protocol, this requirement has been meet. For the NFS version 4 protocol, this requirement has been
relaxed by introducing another type of filehandle, volatile. With relaxed by introducing another type of filehandle, volatile. With
persistent and volatile filehandle types, the server implementation persistent and volatile filehandle types, the server implementation
can match the abilities of the filesystem at the server along with can match the abilities of the filesystem at the server along with
the operating environment. The client will have knowledge of the the operating environment. The client will have knowledge of the
Draft Specification NFS version 4 Protocol November 2002
type of filehandle being provided by the server and can be prepared type of filehandle being provided by the server and can be prepared
to deal with the semantics of each. to deal with the semantics of each.
1.3.3.2. Attribute Types 1.4.3.2. Attribute Types
The NFS version 4 protocol introduces three classes of filesystem or The NFS version 4 protocol introduces three classes of filesystem or
file attributes. Like the additional filehandle type, the file attributes. Like the additional filehandle type, the
classification of file attributes has been done to ease server classification of file attributes has been done to ease server
implementations along with extending the overall functionality of the implementations along with extending the overall functionality of the
NFS protocol. This attribute model is structured to be extensible NFS protocol. This attribute model is structured to be extensible
such that new attributes can be introduced in minor revisions of the such that new attributes can be introduced in minor revisions of the
protocol without requiring significant rework. protocol without requiring significant rework.
The three classifications are: mandatory, recommended and named The three classifications are: mandatory, recommended and named
skipping to change at page 12, line 46 skipping to change at page 13, line 5
directory or file and referred to by a string name. Named attributes directory or file and referred to by a string name. Named attributes
are meant to be used by client applications as a method to associate are meant to be used by client applications as a method to associate
application specific data with a regular file or directory. application specific data with a regular file or directory.
One significant addition to the recommended set of file attributes is One significant addition to the recommended set of file attributes is
the Access Control List (ACL) attribute. This attribute provides for the Access Control List (ACL) attribute. This attribute provides for
directory and file access control beyond the model used in previous directory and file access control beyond the model used in previous
versions of the NFS protocol. The ACL definition allows for versions of the NFS protocol. The ACL definition allows for
specification of user and group level access control. specification of user and group level access control.
1.3.3.3. Filesystem Replication and Migration 1.4.3.3. Filesystem Replication and Migration
With the use of a special file attribute, the ability to migrate or With the use of a special file attribute, the ability to migrate or
replicate server filesystems is enabled within the protocol. The replicate server filesystems is enabled within the protocol. The
filesystem locations attribute provides a method for the client to filesystem locations attribute provides a method for the client to
probe the server about the location of a filesystem. In the event of probe the server about the location of a filesystem. In the event of
a migration of a filesystem, the client will receive an error when a migration of a filesystem, the client will receive an error when
operating on the filesystem and it can then query as to the new file operating on the filesystem and it can then query as to the new file
system location. Similar steps are used for replication, the client system location. Similar steps are used for replication, the client
is able to query the server for the multiple available locations of a is able to query the server for the multiple available locations of a
particular filesystem. From this information, the client can use its particular filesystem. From this information, the client can use its
Draft Specification NFS version 4 Protocol November 2002
own policies to access the appropriate filesystem location. own policies to access the appropriate filesystem location.
1.3.4. OPEN and CLOSE 1.4.4. OPEN and CLOSE
The NFS version 4 protocol introduces OPEN and CLOSE operations. The The NFS version 4 protocol introduces OPEN and CLOSE operations. The
OPEN operation provides a single point where file lookup, creation, OPEN operation provides a single point where file lookup, creation,
and share semantics can be combined. The CLOSE operation also and share semantics can be combined. The CLOSE operation also
provides for the release of state accumulated by OPEN. provides for the release of state accumulated by OPEN.
1.3.5. File locking 1.4.5. File locking
With the NFS version 4 protocol, the support for byte range file With the NFS version 4 protocol, the support for byte range file
locking is part of the NFS protocol. The file locking support is locking is part of the NFS protocol. The file locking support is
structured so that an RPC callback mechanism is not required. This structured so that an RPC callback mechanism is not required. This
is a departure from the previous versions of the NFS file locking is a departure from the previous versions of the NFS file locking
protocol, Network Lock Manager (NLM). The state associated with file protocol, Network Lock Manager (NLM). The state associated with file
locks is maintained at the server under a lease-based model. The locks is maintained at the server under a lease-based model. The
server defines a single lease period for all state held by a NFS server defines a single lease period for all state held by a NFS
client. If the client does not renew its lease within the defined client. If the client does not renew its lease within the defined
period, all state associated with the client's lease may be released period, all state associated with the client's lease may be released
by the server. The client may renew its lease with use of the RENEW by the server. The client may renew its lease with use of the RENEW
operation or implicitly by use of other operations (primarily READ). operation or implicitly by use of other operations (primarily READ).
1.3.6. Client Caching and Delegation 1.4.6. Client Caching and Delegation
The file, attribute, and directory caching for the NFS version 4 The file, attribute, and directory caching for the NFS version 4
protocol is similar to previous versions. Attributes and directory protocol is similar to previous versions. Attributes and directory
information are cached for a duration determined by the client. At information are cached for a duration determined by the client. At
the end of a predefined timeout, the client will query the server to the end of a predefined timeout, the client will query the server to
see if the related filesystem object has been updated. see if the related filesystem object has been updated.
For file data, the client checks its cache validity when the file is For file data, the client checks its cache validity when the file is
opened. A query is sent to the server to determine if the file has opened. A query is sent to the server to determine if the file has
been changed. Based on this information, the client determines if been changed. Based on this information, the client determines if
skipping to change at page 14, line 4 skipping to change at page 14, line 17
The major addition to NFS version 4 in the area of caching is the The major addition to NFS version 4 in the area of caching is the
ability of the server to delegate certain responsibilities to the ability of the server to delegate certain responsibilities to the
client. When the server grants a delegation for a file to a client, client. When the server grants a delegation for a file to a client,
the client is guaranteed certain semantics with respect to the the client is guaranteed certain semantics with respect to the
sharing of that file with other clients. At OPEN, the server may sharing of that file with other clients. At OPEN, the server may
provide the client either a read or write delegation for the file. provide the client either a read or write delegation for the file.
If the client is granted a read delegation, it is assured that no If the client is granted a read delegation, it is assured that no
other client has the ability to write to the file for the duration of other client has the ability to write to the file for the duration of
the delegation. If the client is granted a write delegation, the the delegation. If the client is granted a write delegation, the
Draft Specification NFS version 4 Protocol November 2002
client is assured that no other client has read or write access to client is assured that no other client has read or write access to
the file. the file.
Delegations can be recalled by the server. If another client Delegations can be recalled by the server. If another client
requests access to the file in such a way that the access conflicts requests access to the file in such a way that the access conflicts
with the granted delegation, the server is able to notify the initial with the granted delegation, the server is able to notify the initial
client and recall the delegation. This requires that a callback path client and recall the delegation. This requires that a callback path
exist between the server and client. If this callback path does not exist between the server and client. If this callback path does not
exist, then delegations can not be granted. The essence of a exist, then delegations can not be granted. The essence of a
delegation is that it allows the client to locally service operations delegation is that it allows the client to locally service operations
such as OPEN, CLOSE, LOCK, LOCKU, READ, WRITE without immediate such as OPEN, CLOSE, LOCK, LOCKU, READ, WRITE without immediate
interaction with the server. interaction with the server.
1.4. General Definitions 1.5. General Definitions
The following definitions are provided for the purpose of providing The following definitions are provided for the purpose of providing
an appropriate context for the reader. an appropriate context for the reader.
Client The "client" is the entity that accesses the NFS server's Client The "client" is the entity that accesses the NFS server's
resources. The client may be an application which contains resources. The client may be an application which contains
the logic to access the NFS server directly. The client the logic to access the NFS server directly. The client
may also be the traditional operating system client remote may also be the traditional operating system client remote
filesystem services for a set of applications. filesystem services for a set of applications.
skipping to change at page 15, line 5 skipping to change at page 15, line 20
All leases granted by a server have the same fixed All leases granted by a server have the same fixed
interval. Note that the fixed interval was chosen to interval. Note that the fixed interval was chosen to
alleviate the expense a server would have in maintaining alleviate the expense a server would have in maintaining
state about variable length leases across server failures. state about variable length leases across server failures.
Lock The term "lock" is used to refer to both record (byte- Lock The term "lock" is used to refer to both record (byte-
range) locks as well as share reservations unless range) locks as well as share reservations unless
specifically stated otherwise. specifically stated otherwise.
Draft Specification NFS version 4 Protocol November 2002
Server The "Server" is the entity responsible for coordinating Server The "Server" is the entity responsible for coordinating
client access to a set of filesystems. client access to a set of filesystems.
Stable Storage Stable Storage
NFS version 4 servers must be able to recover without data NFS version 4 servers must be able to recover without data
loss from multiple power failures (including cascading loss from multiple power failures (including cascading
power failures, that is, several power failures in quick power failures, that is, several power failures in quick
succession), operating system failures, and hardware succession), operating system failures, and hardware
failure of components other than the storage medium itself failure of components other than the storage medium itself
(for example, disk, nonvolatile RAM). (for example, disk, nonvolatile RAM).
Some examples of stable storage that are allowable for an Some examples of stable storage that are allowable for an
NFS server include: NFS server include:
1. Media commit of data, that is, the modified data has 1. Media commit of data, that is, the modified data has
been successfully written to the disk media, been successfully written to the disk media, for
for example, the disk platter. example, the disk platter.
2. An immediate reply disk drive with battery-backed 2. An immediate reply disk drive with battery-backed on-
on-drive intermediate storage or uninterruptible power drive intermediate storage or uninterruptible power
system (UPS). system (UPS).
3. Server commit of data with battery-backed intermediate 3. Server commit of data with battery-backed intermediate
storage and recovery software. storage and recovery software.
4. Cache commit with uninterruptible power system (UPS) 4. Cache commit with uninterruptible power system (UPS) and
and recovery software. recovery software.
Stateid A 128-bit quantity returned by a server that uniquely Stateid A 128-bit quantity returned by a server that uniquely
defines the open and locking state provided by the server defines the open and locking state provided by the server
for a specific open or lock owner for a specific file. for a specific open or lock owner for a specific file.
Stateids composed of all bits 0 or all bits 1 have special Stateids composed of all bits 0 or all bits 1 have special
meaning and are reserved values. meaning and are reserved values.
Verifier A 64-bit quantity generated by the client that the server Verifier A 64-bit quantity generated by the client that the server
can use to determine if the client has restarted and lost can use to determine if the client has restarted and lost
all previous lock state. all previous lock state.
Draft Specification NFS version 4 Protocol November 2002
2. Protocol Data Types 2. Protocol Data Types
The syntax and semantics to describe the data types of the NFS The syntax and semantics to describe the data types of the NFS
version 4 protocol are defined in the XDR [RFC1832] and RPC [RFC1831] version 4 protocol are defined in the XDR [RFC1832] and RPC [RFC1831]
documents. The next sections build upon the XDR data types to define documents. The next sections build upon the XDR data types to define
types and structures specific to this protocol. types and structures specific to this protocol.
2.1. Basic Data Types 2.1. Basic Data Types
Data Type Definition Data Type Definition
skipping to change at page 17, line 5 skipping to change at page 17, line 16
mode4 typedef uint32_t mode4; mode4 typedef uint32_t mode4;
Mode attribute data type Mode attribute data type
nfs_cookie4 typedef uint64_t nfs_cookie4; nfs_cookie4 typedef uint64_t nfs_cookie4;
Opaque cookie value for READDIR Opaque cookie value for READDIR
nfs_fh4 typedef opaque nfs_fh4<NFS4_FHSIZE>; nfs_fh4 typedef opaque nfs_fh4<NFS4_FHSIZE>;
Filehandle definition; NFS4_FHSIZE is defined as 128 Filehandle definition; NFS4_FHSIZE is defined as 128
Draft Specification NFS version 4 Protocol November 2002
nfs_ftype4 enum nfs_ftype4; nfs_ftype4 enum nfs_ftype4;
Various defined file types Various defined file types
nfsstat4 enum nfsstat4; nfsstat4 enum nfsstat4;
Return value for operations Return value for operations
offset4 typedef uint64_t offset4; offset4 typedef uint64_t offset4;
Various offset designations (READ, WRITE, Various offset designations (READ, WRITE,
LOCK, COMMIT) LOCK, COMMIT)
skipping to change at page 17, line 36 skipping to change at page 17, line 45
Instead contains an ASN.1 OBJECT IDENTIFIER as used Instead contains an ASN.1 OBJECT IDENTIFIER as used
by GSS-API in the mech_type argument to by GSS-API in the mech_type argument to
GSS_Init_sec_context. See [RFC2743] for details. GSS_Init_sec_context. See [RFC2743] for details.
seqid4 typedef uint32_t seqid4; seqid4 typedef uint32_t seqid4;
Sequence identifier used for file locking Sequence identifier used for file locking
utf8string typedef opaque utf8string<>; utf8string typedef opaque utf8string<>;
UTF-8 encoding for strings UTF-8 encoding for strings
utf8str_cis typedef opaque utf8str_cis<>; utf8str_cis typedef opaque utf8str_cis;
Case-insensitive UTF-8 string Case-insensitive UTF-8 string
utf8str_cs typedef opaque utf8str_cs<>; utf8str_cs typedef opaque utf8str_cs;
Case-sensitive UTF-8 string Case-sensitive UTF-8 string
utf8str_mixed typedef opaque utf8str_mixed;
utf8str_mixed typedef opaque utf8str_mixed<>;
UTF-8 strings with a case sensitive prefix and UTF-8 strings with a case sensitive prefix and
a case insensitive suffix. a case insensitive suffix.
verifier4 typedef opaque verifier4[NFS4_VERIFIER_SIZE]; verifier4 typedef opaque verifier4[NFS4_VERIFIER_SIZE];
Verifier used for various operations (COMMIT, Verifier used for various operations (COMMIT,
CREATE, OPEN, READDIR, SETCLIENTID, CREATE, OPEN, READDIR, SETCLIENTID,
SETCLIENTID_CONFIRM, WRITE) NFS4_VERIFIER_SIZE is SETCLIENTID_CONFIRM, WRITE) NFS4_VERIFIER_SIZE is
defined as 8. defined as 8.
2.2. Structured Data Types 2.2. Structured Data Types
nfstime4 nfstime4
struct nfstime4 { struct nfstime4 {
Draft Specification NFS version 4 Protocol November 2002
int64_t seconds; int64_t seconds;
uint32_t nseconds; uint32_t nseconds;
} }
The nfstime4 structure gives the number of seconds and The nfstime4 structure gives the number of seconds and nanoseconds
nanoseconds since midnight or 0 hour January 1, 1970 Coordinated since midnight or 0 hour January 1, 1970 Coordinated Universal Time
Universal Time (UTC). Values greater than zero for the seconds (UTC). Values greater than zero for the seconds field denote dates
field denote dates after the 0 hour January 1, 1970. Values after the 0 hour January 1, 1970. Values less than zero for the
less than zero for the seconds field denote dates before the 0 seconds field denote dates before the 0 hour January 1, 1970. In
hour January 1, 1970. In both cases, the nseconds field is to both cases, the nseconds field is to be added to the seconds field
be added to the seconds field for the final time representation. for the final time representation. For example, if the time to be
For example, if the time to be represented is one-half second represented is one-half second before 0 hour January 1, 1970, the
before 0 hour January 1, 1970, the seconds field would have a seconds field would have a value of negative one (-1) and the
value of negative one (-1) and the nseconds fields would have a nseconds fields would have a value of one-half second (500000000).
value of one-half second (500000000). Values greater than Values greater than 999,999,999 for nseconds are considered invalid.
999,999,999 for nseconds are considered invalid.
This data type is used to pass time and date information. A This data type is used to pass time and date information. A server
server converts to and from its local representation of time converts to and from its local representation of time when processing
when processing time values, preserving as much accuracy as time values, preserving as much accuracy as possible. If the
possible. If the precision of timestamps stored for a filesystem precision of timestamps stored for a filesystem object is less than
object is less than defined, loss of precision can occur. An defined, loss of precision can occur. An adjunct time maintenance
adjunct time maintenance protocol is recommended to reduce protocol is recommended to reduce client and server time skew.
client and server time skew.
time_how4 time_how4
enum time_how4 { enum time_how4 {
SET_TO_SERVER_TIME4 = 0, SET_TO_SERVER_TIME4 = 0,
SET_TO_CLIENT_TIME4 = 1 SET_TO_CLIENT_TIME4 = 1
}; };
settime4 settime4
union settime4 switch (time_how4 set_it) { union settime4 switch (time_how4 set_it) {
case SET_TO_CLIENT_TIME4: case SET_TO_CLIENT_TIME4:
nfstime4 time; nfstime4 time;
default: default:
void; void;
}; };
The above definitions are used as the attribute definitions to The above definitions are used as the attribute definitions to set
set time values. If set_it is SET_TO_SERVER_TIME4, then the time values. If set_it is SET_TO_SERVER_TIME4, then the server uses
server uses its local representation of time for the time value. its local representation of time for the time value.
specdata4 specdata4
struct specdata4 { struct specdata4 {
uint32_t specdata1; /* major device number */ uint32_t specdata1; /* major device number */
Draft Specification NFS version 4 Protocol November 2002
uint32_t specdata2; /* minor device number */ uint32_t specdata2; /* minor device number */
}; };
This data type represents additional information for the device This data type represents additional information for the device file
file types NF4CHR and NF4BLK. types NF4CHR and NF4BLK.
fsid4 fsid4
struct fsid4 { struct fsid4 {
uint64_t major; uint64_t major;
uint64_t minor; uint64_t minor;
}; };
This type is the filesystem identifier that is used as a This type is the filesystem identifier that is used as a mandatory
mandatory attribute. attribute.
fs_location4 fs_location4
struct fs_location4 { struct fs_location4 {
utf8str_cis server<>; utf8str_cis server<>;
pathname4 rootpath; pathname4 rootpath;
}; };
fs_locations4 fs_locations4
skipping to change at page 19, line 36 skipping to change at page 20, line 4
utf8str_cis server<>; utf8str_cis server<>;
pathname4 rootpath; pathname4 rootpath;
}; };
fs_locations4 fs_locations4
struct fs_locations4 { struct fs_locations4 {
pathname4 fs_root; pathname4 fs_root;
fs_location4 locations<>; fs_location4 locations<>;
}; };
The fs_location4 and fs_locations4 data types are used for the The fs_location4 and fs_locations4 data types are used for the
fs_locations recommended attribute which is used for migration fs_locations recommended attribute which is used for migration and
and replication support. replication support.
fattr4 fattr4
struct fattr4 { struct fattr4 {
bitmap4 attrmask; bitmap4 attrmask;
attrlist4 attr_vals; attrlist4 attr_vals;
}; };
The fattr4 structure is used to represent file and directory The fattr4 structure is used to represent file and directory
attributes. attributes.
The bitmap is a counted array of 32 bit integers used to contain The bitmap is a counted array of 32 bit integers used to contain bit
bit values. The position of the integer in the array that values. The position of the integer in the array that contains bit n
contains bit n can be computed from the expression (n / 32) and can be computed from the expression (n / 32) and its bit within that
its bit within that integer is (n mod 32). integer is (n mod 32).
Draft Specification NFS version 4 Protocol November 2002
0 1 0 1
+-----------+-----------+-----------+-- +-----------+-----------+-----------+--
| count | 31 .. 0 | 63 .. 32 | | count | 31 .. 0 | 63 .. 32 |
+-----------+-----------+-----------+-- +-----------+-----------+-----------+--
change_info4 change_info4
struct change_info4 { struct change_info4 {
bool atomic; bool atomic;
changeid4 before; changeid4 before;
changeid4 after; changeid4 after;
}; };
This structure is used with the CREATE, LINK, REMOVE, RENAME This structure is used with the CREATE, LINK, REMOVE, RENAME
operations to let the client know the value of the change operations to let the client know the value of the change attribute
attribute for the directory in which the target filesystem for the directory in which the target filesystem object resides.
object resides.
clientaddr4 clientaddr4
struct clientaddr4 { struct clientaddr4 {
/* see struct rpcb in RFC1833 */ /* see struct rpcb in RFC1833 */
string r_netid<>; /* network id */ string r_netid<>; /* network id */
string r_addr<>; /* universal address */ string r_addr<>; /* universal address */
}; };
The clientaddr4 structure is used as part of the SETCLIENTID The clientaddr4 structure is used as part of the SETCLIENTID
operation to either specify the address of the client that is operation to either specify the address of the client that is using a
using a clientid or as part of the callback registration. The clientid or as part of the callback registration. The
r_netid and r_addr fields are specified in [RFC1833], but they r_netid and r_addr fields are specified in [RFC1833], but they are
are underspecified in [RFC1833] as far as what they should look underspecified in [RFC1833] as far as what they should look like for
like for specific protocols. specific protocols.
For TCP over IPv4 and for UDP over IPv4, the format of r_addr is For TCP over IPv4 and for UDP over IPv4, the format of r_addr is the
the US-ASCII string: US-ASCII string:
h1.h2.h3.h4.p1.p2 h1.h2.h3.h4.p1.p2
The prefix, "h1.h2.h3.h4", is the standard textual form for The prefix, "h1.h2.h3.h4", is the standard textual form for
representing an IPv4 address, which is always four octets long. representing an IPv4 address, which is always four octets long.
Assuming big-endian ordering, h1, h2, h3, and h4, are Assuming big-endian ordering, h1, h2, h3, and h4, are respectively,
respectively, the first through fourth octets each converted to the first through fourth octets each converted to ASCII-decimal.
ASCII-decimal. Assuming big-endian ordering, p1 and p2 are, Assuming big-endian ordering, p1 and p2 are, respectively, the first
respectively, the first and second octets each converted to and second octets each converted to ASCII-decimal. For example, if a
ASCII-decimal. For example, if a host, in big-endian order, has host, in big-endian order, has an address of 0x0A010307 and there is
an address of 0x0A010307 and there is a service listening on, in a service listening on, in big endian order, port 0x020F (decimal
big endian order, port 0x020F (decimal 527), then complete 527), then the complete universal address is "10.1.3.7.2.15".
universal address is "10.1.3.7.2.15".
For TCP over IPv4 the value of r_netid is the string "tcp". For
Draft Specification NFS version 4 Protocol November 2002
UDP over IPv4 the value of r_netid is the string "udp". For TCP over IPv4 the value of r_netid is the string "tcp". For UDP
over IPv4 the value of r_netid is the string "udp".
For TCP over IPv6 and for UDP over IPv6, the format of r_addr is For TCP over IPv6 and for UDP over IPv6, the format of r_addr is the
the US-ASCII string: US-ASCII string:
x1:x2:x3:x4:x5:x6:x7:x8.p1.p2 x1:x2:x3:x4:x5:x6:x7:x8.p1.p2
The suffix "p1.p2" is the service port, and is computed the same The suffix "p1.p2" is the service port, and is computed the same way
way as with universal addresses for TCP and UDP over IPv4. The as with universal addresses for TCP and UDP over IPv4. The prefix,
prefix, "x1:x2:x3:x4:x5:x6:x7:x8", is the standard textual form "x1:x2:x3:x4:x5:x6:x7:x8", is the standard textual form for
for representing an IPv6 address as defined in Section 2.2 of representing an IPv6 address as defined in Section 2.2 of [RFC2373].
[RFC1884]. Additionally, the two alternative forms specified in Additionally, the two alternative forms specified in Section 2.2 of
Section 2.2 of [RFC1884] are also acceptable. [RFC2373] are also acceptable.
For TCP over IPv6 the value of r_netid is the string "tcp6". For TCP over IPv6 the value of r_netid is the string "tcp6". For UDP
For UDP over IPv6 the value of r_netid is the string "udp6". over IPv6 the value of r_netid is the string "udp6".
cb_client4 cb_client4
struct cb_client4 { struct cb_client4 {
unsigned int cb_program; unsigned int cb_program;
clientaddr4 cb_location; clientaddr4 cb_location;
}; };
This structure is used by the client to inform the server of its This structure is used by the client to inform the server of its call
call back address; includes the program number and client back address; includes the program number and client address.
address.
nfs_client_id4 nfs_client_id4
struct nfs_client_id4 { struct nfs_client_id4 {
verifier4 verifier; verifier4 verifier;
opaque id<NFS4_OPAQUE_LIMIT>; opaque id<NFS4_OPAQUE_LIMIT>;
}; };
This structure is part of the arguments to the SETCLIENTID This structure is part of the arguments to the SETCLIENTID operation.
operation. NFS4_OPAQUE_LIMIT is defined as 1024. NFS4_OPAQUE_LIMIT is defined as 1024.
open_owner4 open_owner4
struct open_owner4 { struct open_owner4 {
clientid4 clientid; clientid4 clientid;
opaque owner<NFS4_OPAQUE_LIMIT>; opaque owner<NFS4_OPAQUE_LIMIT>;
}; };
This structure is used to identify the owner of open state. This structure is used to identify the owner of open state.
NFS4_OPAQUE_LIMIT is defined as 1024. NFS4_OPAQUE_LIMIT is defined as 1024.
Draft Specification NFS version 4 Protocol November 2002
lock_owner4 lock_owner4
struct lock_owner4 { struct lock_owner4 {
clientid4 clientid; clientid4 clientid;
opaque owner<NFS4_OPAQUE_LIMIT>; opaque owner<NFS4_OPAQUE_LIMIT>;
}; };
This structure is used to identify the owner of file locking This structure is used to identify the owner of file locking state.
state. NFS4_OPAQUE_LIMIT is defined as 1024. NFS4_OPAQUE_LIMIT is defined as 1024.
open_to_lock_owner4 open_to_lock_owner4
struct open_to_lock_owner4 { struct open_to_lock_owner4 {
seqid4 open_seqid; seqid4 open_seqid;
stateid4 open_stateid; stateid4 open_stateid;
seqid4 lock_seqid; seqid4 lock_seqid;
lock_owner4 lock_owner; lock_owner4 lock_owner;
}; };
This structure is used for the first LOCK operation done for an This structure is used for the first LOCK operation done for an
open_owner4. It provides both the open_stateid and lock_owner open_owner4. It provides both the open_stateid and lock_owner such
such that the transition is made from a valid open_stateid that the transition is made from a valid open_stateid sequence to
sequence to that of the new lock_stateid sequence. Using this that of the new lock_stateid sequence. Using this mechanism avoids
mechanism avoids the confirmation of the lock_owner/lock_seqid the confirmation of the lock_owner/lock_seqid pair since it is tied
pair since it is tied to established state in the form of the to established state in the form of the open_stateid/open_seqid.
open_stateid/open_seqid.
stateid4 stateid4
struct stateid4 { struct stateid4 {
uint32_t seqid; uint32_t seqid;
opaque other[12]; opaque other[12];
}; };
This structure is used for the various state sharing mechanisms This structure is used for the various state sharing mechanisms
between the client and server. For the client, this data between the client and server. For the client, this data structure
structure is read-only. The starting value of the seqid field is read-only. The starting value of the seqid field is undefined.
is undefined. The server is required to increment the seqid The server is required to increment the seqid field monotonically at
field monotonically at each transition of the stateid. This is each transition of the stateid. This is important since the client
important since the client will inspect the seqid in OPEN will inspect the seqid in OPEN stateids to determine the order of
stateids to determine the order of OPEN processing done by the OPEN processing done by the server.
server.
Draft Specification NFS version 4 Protocol November 2002
3. RPC and Security Flavor 3. RPC and Security Flavor
The NFS version 4 protocol is a Remote Procedure Call (RPC) The NFS version 4 protocol is a Remote Procedure Call (RPC)
application that uses RPC version 2 and the corresponding eXternal application that uses RPC version 2 and the corresponding eXternal
Data Representation (XDR) as defined in [RFC1831] and [RFC1832]. The Data Representation (XDR) as defined in [RFC1831] and [RFC1832]. The
RPCSEC_GSS security flavor as defined in [RFC2203] MUST be used as RPCSEC_GSS security flavor as defined in [RFC2203] MUST be used as
the mechanism to deliver stronger security for the NFS version 4 the mechanism to deliver stronger security for the NFS version 4
protocol. protocol.
3.1. Ports and Transports 3.1. Ports and Transports
Historically, NFS version 2 and version 3 servers have resided on Historically, NFS version 2 and version 3 servers have resided on
port 2049. The registered port 2049 [RFC1700] for the NFS protocol port 2049. The registered port 2049 [RFC3232] for the NFS protocol
should be the default configuration. Using the registered port for should be the default configuration. Using the registered port for
NFS services means the NFS client will not need to use the RPC NFS services means the NFS client will not need to use the RPC
binding protocols as described in [RFC1833]; this will allow NFS to binding protocols as described in [RFC1833]; this will allow NFS to
transit firewalls. transit firewalls.
Where an NFS version 4 implementation supports operation over the IP Where an NFS version 4 implementation supports operation over the IP
network protocol, the supported transports between NFS and IP MUST be network protocol, the supported transports between NFS and IP MUST be
among the IETF-approved congestion control transport protocols, which among the IETF-approved congestion control transport protocols, which
include TCP and SCTP. To enhance the possibilities for include TCP and SCTP. To enhance the possibilities for
interoperability, an NFS version 4 implementation MUST support interoperability, an NFS version 4 implementation MUST support
skipping to change at page 23, line 52 skipping to change at page 24, line 17
based. However, this modification of the authentication model does based. However, this modification of the authentication model does
not imply a technical requirement to move the TCP connection not imply a technical requirement to move the TCP connection
management model from whole machine-based to one based on a per user management model from whole machine-based to one based on a per user
model. In particular, NFS over TCP client implementations have model. In particular, NFS over TCP client implementations have
traditionally multiplexed traffic for multiple users over a common traditionally multiplexed traffic for multiple users over a common
TCP connection between an NFS client and server. This has been true, TCP connection between an NFS client and server. This has been true,
regardless whether the NFS client is using AUTH_SYS, AUTH_DH, regardless whether the NFS client is using AUTH_SYS, AUTH_DH,
RPCSEC_GSS or any other flavor. Similarly, NFS over TCP server RPCSEC_GSS or any other flavor. Similarly, NFS over TCP server
implementations have assumed such a model and thus scale the implementations have assumed such a model and thus scale the
implementation of TCP connection management in proportion to the implementation of TCP connection management in proportion to the
number of expected client machines. It is intended that NFS version 4 number of expected client machines. It is intended that NFS version
will not modify this connection management model. NFS version 4 4 will not modify this connection management model. NFS version 4
clients that violate this assumption can expect scaling issues on the clients that violate this assumption can expect scaling issues on the
server and hence reduced service. server and hence reduced service.
Note that for various timers, the client and server should avoid Note that for various timers, the client and server should avoid
inadvertent synchronization of those timers. For further discussion inadvertent synchronization of those timers. For further discussion
Draft Specification NFS version 4 Protocol November 2002
of the general issue refer to [Floyd]. of the general issue refer to [Floyd].
3.1.1. Client Retransmission Behavior 3.1.1. Client Retransmission Behavior
When processing a request received over a reliable transport such as When processing a request received over a reliable transport such as
TCP, the NFS version 4 server MUST NOT silently drop the request, TCP, the NFS version 4 server MUST NOT silently drop the request,
except if the transport connection has been broken. Given such a except if the transport connection has been broken. Given such a
contract between NFS version 4 clients and servers, clients MUST NOT contract between NFS version 4 clients and servers, clients MUST NOT
retry a request unless one or both of the following are true: retry a request unless one or both of the following are true:
o The transport connection has been broken o The transport connection has been broken
o The procedure being retried is the NULL procedure o The procedure being retried is the NULL procedure
Since reliable transports, such as TCP, do not always synchronously Since reliable transports, such as TCP, do not always synchronously
inform a peer when the other peer has broken the connection (for inform a peer when the other peer has broken the connection (for
example, when an NFS server reboots), so the NFS version 4 client may example, when an NFS server reboots), the NFS version 4 client may
want to actively "probe" the connection to see if has been broken. want to actively "probe" the connection to see if has been broken.
Use of the NULL procedure is one recommended way to do so. So, when Use of the NULL procedure is one recommended way to do so. So, when
a client experiences a remote procedure call timeout (of some a client experiences a remote procedure call timeout (of some
arbitrary implementation specific amount), rather than retrying the arbitrary implementation specific amount), rather than retrying the
remote procedure call, it could instead issue a NULL procedure call remote procedure call, it could instead issue a NULL procedure call
to the server. If the server has died, the transport connection break to the server. If the server has died, the transport connection
will eventually be indicated to the NFS version 4 client. The client break will eventually be indicated to the NFS version 4 client. The
can then reconnect, and then retry the original request. If the NULL client can then reconnect, and then retry the original request. If
procedure call gets a response, the connection has not broken. The the NULL procedure call gets a response, the connection has not
client can decide to wait longer for the original request's response, broken. The client can decide to wait longer for the original
or it can break the transport connection and reconnect before re- request's response, or it can break the transport connection and
sending the original request. reconnect before re-sending the original request.
For callbacks from the server to the client, the same rules apply, For callbacks from the server to the client, the same rules apply,
but the server doing the callback becomes the client, and the client but the server doing the callback becomes the client, and the client
receiving the callback becomes the server. receiving the callback becomes the server.
3.2. Security Flavors 3.2. Security Flavors
Traditional RPC implementations have included AUTH_NONE, AUTH_SYS, Traditional RPC implementations have included AUTH_NONE, AUTH_SYS,
AUTH_DH, and AUTH_KRB4 as security flavors. With [RFC2203] an AUTH_DH, and AUTH_KRB4 as security flavors. With [RFC2203] an
additional security flavor of RPCSEC_GSS has been introduced which additional security flavor of RPCSEC_GSS has been introduced which
uses the functionality of GSS-API [RFC2743]. This allows for the use uses the functionality of GSS-API [RFC2743]. This allows for the use
of various security mechanisms by the RPC layer without the of various security mechanisms by the RPC layer without the
additional implementation overhead of adding RPC security flavors. additional implementation overhead of adding RPC security flavors.
For NFS version 4, the RPCSEC_GSS security flavor MUST be used to For NFS version 4, the RPCSEC_GSS security flavor MUST be used to
enable the mandatory security mechanism. Other flavors, such as, enable the mandatory security mechanism. Other flavors, such as,
AUTH_NONE, AUTH_SYS, and AUTH_DH MAY be implemented as well. AUTH_NONE, AUTH_SYS, and AUTH_DH MAY be implemented as well.
3.2.1. Security mechanisms for NFS version 4 3.2.1. Security mechanisms for NFS version 4
The use of RPCSEC_GSS requires selection of: mechanism, quality of The use of RPCSEC_GSS requires selection of: mechanism, quality of
Draft Specification NFS version 4 Protocol November 2002
protection, and service (authentication, integrity, privacy). The protection, and service (authentication, integrity, privacy). The
remainder of this document will refer to these three parameters of remainder of this document will refer to these three parameters of
the RPCSEC_GSS security as the security triple. the RPCSEC_GSS security as the security triple.
3.2.1.1. Kerberos V5 as a security triple 3.2.1.1. Kerberos V5 as a security triple
The Kerberos V5 GSS-API mechanism as described in [RFC1964] MUST be The Kerberos V5 GSS-API mechanism as described in [RFC1964] MUST be
implemented and provide the following security triples. implemented and provide the following security triples.
column descriptions: column descriptions:
1 == number of pseudo flavor 1 == number of pseudo flavor
2 == name of pseudo flavor 2 == name of pseudo flavor
3 == mechanism's OID 3 == mechanism's OID
4 == mechanism's algorithm(s) 4 == mechanism's algorithm(s)
5 == RPCSEC_GSS service 5 == RPCSEC_GSS service
1 2 3 4 5 1 2 3 4 5
----------------------------------------------------------------------- --------------------------------------------------------------------
390003 krb5 1.2.840.113554.1.2.2 DES MAC MD5 rpc_gss_svc_none 390003 krb5 1.2.840.113554.1.2.2 DES MAC MD5 rpc_gss_svc_none
390004 krb5i 1.2.840.113554.1.2.2 DES MAC MD5 rpc_gss_svc_integrity 390004 krb5i 1.2.840.113554.1.2.2 DES MAC MD5 rpc_gss_svc_integrity
390005 krb5p 1.2.840.113554.1.2.2 DES MAC MD5 rpc_gss_svc_privacy 390005 krb5p 1.2.840.113554.1.2.2 DES MAC MD5 rpc_gss_svc_privacy
for integrity, for integrity,
and 56 bit DES and 56 bit DES
for privacy. for privacy.
Note that the pseudo flavor is presented here as a mapping aid to the Note that the pseudo flavor is presented here as a mapping aid to the
implementor. Because this NFS protocol includes a method to implementor. Because this NFS protocol includes a method to
negotiate security and it understands the GSS-API mechanism, the negotiate security and it understands the GSS-API mechanism, the
skipping to change at page 26, line 4 skipping to change at page 26, line 26
migrate to the use of AES. migrate to the use of AES.
3.2.1.2. LIPKEY as a security triple 3.2.1.2. LIPKEY as a security triple
The LIPKEY GSS-API mechanism as described in [RFC2847] MUST be The LIPKEY GSS-API mechanism as described in [RFC2847] MUST be
implemented and provide the following security triples. The implemented and provide the following security triples. The
definition of the columns matches the previous subsection "Kerberos definition of the columns matches the previous subsection "Kerberos
V5 as security triple" V5 as security triple"
1 2 3 4 5 1 2 3 4 5
--------------------------------------------------------------------
Draft Specification NFS version 4 Protocol November 2002
-----------------------------------------------------------------------
390006 lipkey 1.3.6.1.5.5.9 negotiated rpc_gss_svc_none 390006 lipkey 1.3.6.1.5.5.9 negotiated rpc_gss_svc_none
390007 lipkey-i 1.3.6.1.5.5.9 negotiated rpc_gss_svc_integrity 390007 lipkey-i 1.3.6.1.5.5.9 negotiated rpc_gss_svc_integrity
390008 lipkey-p 1.3.6.1.5.5.9 negotiated rpc_gss_svc_privacy 390008 lipkey-p 1.3.6.1.5.5.9 negotiated rpc_gss_svc_privacy
The mechanism algorithm is listed as "negotiated". This is because The mechanism algorithm is listed as "negotiated". This is because
LIPKEY is layered on SPKM-3 and in SPKM-3 [RFC2847] the LIPKEY is layered on SPKM-3 and in SPKM-3 [RFC2847] the
confidentiality and integrity algorithms are negotiated. Since confidentiality and integrity algorithms are negotiated. Since
SPKM-3 specifies HMAC-MD5 for integrity as MANDATORY, 128 bit SPKM-3 specifies HMAC-MD5 for integrity as MANDATORY, 128 bit
cast5CBC for confidentiality for privacy as MANDATORY, and further cast5CBC for confidentiality for privacy as MANDATORY, and further
specifies that HMAC-MD5 and cast5CBC MUST be listed first before specifies that HMAC-MD5 and cast5CBC MUST be listed first before
skipping to change at page 26, line 42 skipping to change at page 27, line 13
details. details.
3.2.1.3. SPKM-3 as a security triple 3.2.1.3. SPKM-3 as a security triple
The SPKM-3 GSS-API mechanism as described in [RFC2847] MUST be The SPKM-3 GSS-API mechanism as described in [RFC2847] MUST be
implemented and provide the following security triples. The implemented and provide the following security triples. The
definition of the columns matches the previous subsection "Kerberos definition of the columns matches the previous subsection "Kerberos
V5 as security triple". V5 as security triple".
1 2 3 4 5 1 2 3 4 5
----------------------------------------------------------------------- --------------------------------------------------------------------
390009 spkm3 1.3.6.1.5.5.1.3 negotiated rpc_gss_svc_none 390009 spkm3 1.3.6.1.5.5.1.3 negotiated rpc_gss_svc_none
390010 spkm3i 1.3.6.1.5.5.1.3 negotiated rpc_gss_svc_integrity 390010 spkm3i 1.3.6.1.5.5.1.3 negotiated rpc_gss_svc_integrity
390011 spkm3p 1.3.6.1.5.5.1.3 negotiated rpc_gss_svc_privacy 390011 spkm3p 1.3.6.1.5.5.1.3 negotiated rpc_gss_svc_privacy
For a discussion as to why the mechanism algorithm is listed as For a discussion as to why the mechanism algorithm is listed as
"negotiated", see the previous section "LIPKEY as a security triple." "negotiated", see the previous section "LIPKEY as a security triple."
Because SPKM-3 negotiates the algorithms, subsequent calls to SPKM- Because SPKM-3 negotiates the algorithms, subsequent calls to SPKM-
3's GSS_Wrap() and GSS_GetMIC() by RPCSEC_GSS will use a quality of 3's GSS_Wrap() and GSS_GetMIC() by RPCSEC_GSS will use a quality of
protection value of 0 (zero). See section 5.2 of [RFC2025] for an protection value of 0 (zero). See section 5.2 of [RFC2025] for an
explanation. explanation.
Even though LIPKEY is layered over SPKM-3, SPKM-3 is specified as a Even though LIPKEY is layered over SPKM-3, SPKM-3 is specified as a
mandatory set of triples to handle the situations where the initiator mandatory set of triples to handle the situations where the initiator
(the client) is anonymous or where the initiator has its own (the client) is anonymous or where the initiator has its own
Draft Specification NFS version 4 Protocol November 2002
certificate. If the initiator is anonymous, there will not be a user certificate. If the initiator is anonymous, there will not be a user
name and password to send to the target (the server). If the name and password to send to the target (the server). If the
initiator has its own certificate, then using passwords is initiator has its own certificate, then using passwords is
superfluous. superfluous.
3.3. Security Negotiation 3.3. Security Negotiation
With the NFS version 4 server potentially offering multiple security With the NFS version 4 server potentially offering multiple security
mechanisms, the client needs a method to determine or negotiate which mechanisms, the client needs a method to determine or negotiate which
mechanism is to be used for its communication with the server. The mechanism is to be used for its communication with the server. The
skipping to change at page 27, line 41 skipping to change at page 28, line 18
per filehandle basis, what security triple is to be used for server per filehandle basis, what security triple is to be used for server
access. In general, the client will not have to use the SECINFO access. In general, the client will not have to use the SECINFO
operation except during initial communication with the server or when operation except during initial communication with the server or when
the client crosses policy boundaries at the server. It is possible the client crosses policy boundaries at the server. It is possible
that the server's policies change during the client's interaction that the server's policies change during the client's interaction
therefore forcing the client to negotiate a new security triple. therefore forcing the client to negotiate a new security triple.
3.3.2. Security Error 3.3.2. Security Error
Based on the assumption that each NFS version 4 client and server Based on the assumption that each NFS version 4 client and server
must support a minimum set of security (i.e. LIPKEY, SPKM-3, and must support a minimum set of security (i.e., LIPKEY, SPKM-3, and
Kerberos-V5 all under RPCSEC_GSS), the NFS client will start its Kerberos-V5 all under RPCSEC_GSS), the NFS client will start its
communication with the server with one of the minimal security communication with the server with one of the minimal security
triples. During communication with the server, the client may triples. During communication with the server, the client may
receive an NFS error of NFS4ERR_WRONGSEC. This error allows the receive an NFS error of NFS4ERR_WRONGSEC. This error allows the
server to notify the client that the security triple currently being server to notify the client that the security triple currently being
used is not appropriate for access to the server's filesystem used is not appropriate for access to the server's filesystem
resources. The client is then responsible for determining what resources. The client is then responsible for determining what
security triples are available at the server and choose one which is security triples are available at the server and choose one which is
appropriate for the client. See the section for the "SECINFO" appropriate for the client. See the section for the "SECINFO"
operation for further discussion of how the client will respond to operation for further discussion of how the client will respond to
the NFS4ERR_WRONGSEC error and use SECINFO. the NFS4ERR_WRONGSEC error and use SECINFO.
Draft Specification NFS version 4 Protocol November 2002
3.4. Callback RPC Authentication 3.4. Callback RPC Authentication
Except as noted elsewhere in this section, the callback RPC Except as noted elsewhere in this section, the callback RPC
(described later) MUST mutually authenticate the NFS server to the (described later) MUST mutually authenticate the NFS server to the
principal that acquired the clientid (also described later), using principal that acquired the clientid (also described later), using
the security flavor the original SETCLIENTID operation used. the security flavor the original SETCLIENTID operation used.
For AUTH_NONE, there are no principals, so this is a non-issue. For AUTH_NONE, there are no principals, so this is a non-issue.
AUTH_SYS has no notions of mutual authentication or a server AUTH_SYS has no notions of mutual authentication or a server
skipping to change at page 29, line 5 skipping to change at page 29, line 36
For Kerberos V5, nfs/hostname would be a server principal in the For Kerberos V5, nfs/hostname would be a server principal in the
Kerberos Key Distribution Center database. This is the same Kerberos Key Distribution Center database. This is the same
principal the client acquired a GSS-API context for when it issued principal the client acquired a GSS-API context for when it issued
the SETCLIENTID operation, therefore, the realm name for the server the SETCLIENTID operation, therefore, the realm name for the server
principal must be the same for the callback as it was for the principal must be the same for the callback as it was for the
SETCLIENTID. SETCLIENTID.
For LIPKEY, this would be the username passed to the target (the NFS For LIPKEY, this would be the username passed to the target (the NFS
version 4 client that receives the callback). version 4 client that receives the callback).
Draft Specification NFS version 4 Protocol November 2002
It should be noted that LIPKEY may not work for callbacks, since the It should be noted that LIPKEY may not work for callbacks, since the
LIPKEY client uses a user id/password. If the NFS client receiving LIPKEY client uses a user id/password. If the NFS client receiving
the callback can authenticate the NFS server's user name/password the callback can authenticate the NFS server's user name/password
pair, and if the user that the NFS server is authenticating to has a pair, and if the user that the NFS server is authenticating to has a
public key certificate, then it works. public key certificate, then it works.
In situations where the NFS client uses LIPKEY and uses a per-host In situations where the NFS client uses LIPKEY and uses a per-host
principal for the SETCLIENTID operation, instead of using LIPKEY for principal for the SETCLIENTID operation, instead of using LIPKEY for
SETCLIENTID, it is RECOMMENDED that SPKM-3 with mutual authentication SETCLIENTID, it is RECOMMENDED that SPKM-3 with mutual authentication
be used. This effectively means that the client will use a be used. This effectively means that the client will use a
certificate to authenticate and identify the initiator to the target certificate to authenticate and identify the initiator to the target
on the NFS server. Using SPKM-3 and not LIPKEY has the following on the NFS server. Using SPKM-3 and not LIPKEY has the following
advantages: advantages:
o When the server does a callback, it must authenticate to the o When the server does a callback, it must authenticate to the
principal used in the SETCLIENTID. Even if LIPKEY is used, principal used in the SETCLIENTID. Even if LIPKEY is used,
because LIPKEY is layered over SPKM-3, the NFS client will need because LIPKEY is layered over SPKM-3, the NFS client will need to
to have a certificate that corresponds to the principal used in have a certificate that corresponds to the principal used in the
the SETCLIENTID operation. From an administrative perspective, SETCLIENTID operation. From an administrative perspective, having
having a user name, password, and certificate for both the a user name, password, and certificate for both the client and
client and server is redundant. server is redundant.
o LIPKEY was intended to minimize additional infrastructure o LIPKEY was intended to minimize additional infrastructure
requirements beyond a certificate for the target, and the requirements beyond a certificate for the target, and the
expectation is that existing password infrastructure can be expectation is that existing password infrastructure can be
leveraged for the initiator. In some environments, a per-host leveraged for the initiator. In some environments, a per-host
password does not exist yet. If certificates are used for any password does not exist yet. If certificates are used for any
per-host principals, then additional password infrastructure is per-host principals, then additional password infrastructure is
not needed. not needed.
o In cases when a host is both an NFS client and server, it can o In cases when a host is both an NFS client and server, it can
share the same per-host certificate. share the same per-host certificate.
Draft Specification NFS version 4 Protocol November 2002
4. Filehandles 4. Filehandles
The filehandle in the NFS protocol is a per server unique identifier The filehandle in the NFS protocol is a per server unique identifier
for a filesystem object. The contents of the filehandle are opaque for a filesystem object. The contents of the filehandle are opaque
to the client. Therefore, the server is responsible for translating to the client. Therefore, the server is responsible for translating
the filehandle to an internal representation of the filesystem the filehandle to an internal representation of the filesystem
object. object.
4.1. Obtaining the First Filehandle 4.1. Obtaining the First Filehandle
skipping to change at page 31, line 4 skipping to change at page 31, line 22
ROOT of the server's file tree. Once this PUTROOTFH operation is ROOT of the server's file tree. Once this PUTROOTFH operation is
used, the client can then traverse the entirety of the server's file used, the client can then traverse the entirety of the server's file
tree with the LOOKUP operation. A complete discussion of the server tree with the LOOKUP operation. A complete discussion of the server
name space is in the section "NFS Server Name Space". name space is in the section "NFS Server Name Space".
4.1.2. Public Filehandle 4.1.2. Public Filehandle
The second special filehandle is the PUBLIC filehandle. Unlike the The second special filehandle is the PUBLIC filehandle. Unlike the
ROOT filehandle, the PUBLIC filehandle may be bound or represent an ROOT filehandle, the PUBLIC filehandle may be bound or represent an
arbitrary filesystem object at the server. The server is responsible arbitrary filesystem object at the server. The server is responsible
Draft Specification NFS version 4 Protocol November 2002
for this binding. It may be that the PUBLIC filehandle and the ROOT for this binding. It may be that the PUBLIC filehandle and the ROOT
filehandle refer to the same filesystem object. However, it is up to filehandle refer to the same filesystem object. However, it is up to
the administrative software at the server and the policies of the the administrative software at the server and the policies of the
server administrator to define the binding of the PUBLIC filehandle server administrator to define the binding of the PUBLIC filehandle
and server filesystem object. The client may not make any and server filesystem object. The client may not make any
assumptions about this binding. The client uses the PUBLIC filehandle assumptions about this binding. The client uses the PUBLIC
via the PUTPUBFH operation. filehandle via the PUTPUBFH operation.
4.2. Filehandle Types 4.2. Filehandle Types
In the NFS version 2 and 3 protocols, there was one type of In the NFS version 2 and 3 protocols, there was one type of
filehandle with a single set of semantics. This type of filehandle filehandle with a single set of semantics. This type of filehandle
is termed "persistent" in NFS Version 4. The semantics of a is termed "persistent" in NFS Version 4. The semantics of a
persistent filehandle remain the same as before. A new type of persistent filehandle remain the same as before. A new type of
filehandle introduced in NFS Version 4 is the "volatile" filehandle, filehandle introduced in NFS Version 4 is the "volatile" filehandle,
which attempts to accommodate certain server environments. which attempts to accommodate certain server environments.
skipping to change at page 32, line 4 skipping to change at page 32, line 26
doing a byte-by-byte comparison. However, the client MUST NOT doing a byte-by-byte comparison. However, the client MUST NOT
otherwise interpret the contents of filehandles. If two filehandles otherwise interpret the contents of filehandles. If two filehandles
from the same server are equal, they MUST refer to the same file. from the same server are equal, they MUST refer to the same file.
Servers SHOULD try to maintain a one-to-one correspondence between Servers SHOULD try to maintain a one-to-one correspondence between
filehandles and files but this is not required. Clients MUST use filehandles and files but this is not required. Clients MUST use
filehandle comparisons only to improve performance, not for correct filehandle comparisons only to improve performance, not for correct
behavior. All clients need to be prepared for situations in which it behavior. All clients need to be prepared for situations in which it
cannot be determined whether two filehandles denote the same object cannot be determined whether two filehandles denote the same object
and in such cases, avoid making invalid assumptions which might cause and in such cases, avoid making invalid assumptions which might cause
incorrect behavior. Further discussion of filehandle and attribute incorrect behavior. Further discussion of filehandle and attribute
Draft Specification NFS version 4 Protocol November 2002
comparison in the context of data caching is presented in the section comparison in the context of data caching is presented in the section
"Data Caching and File Identity". "Data Caching and File Identity".
As an example, in the case that two different path names when As an example, in the case that two different path names when
traversed at the server terminate at the same filesystem object, the traversed at the server terminate at the same filesystem object, the
server SHOULD return the same filehandle for each path. This can server SHOULD return the same filehandle for each path. This can
occur if a hard link is used to create two file names which refer to occur if a hard link is used to create two file names which refer to
the same underlying file object and associated data. For example, if the same underlying file object and associated data. For example, if
paths /a/b/c and /a/d/c refer to the same file, the server SHOULD paths /a/b/c and /a/d/c refer to the same file, the server SHOULD
return the same filehandle for both path names traversals. return the same filehandle for both path names traversals.
skipping to change at page 32, line 37 skipping to change at page 33, line 7
server must honor the same filehandle as the old NFS server. server must honor the same filehandle as the old NFS server.
The persistent filehandle will be become stale or invalid when the The persistent filehandle will be become stale or invalid when the
filesystem object is removed. When the server is presented with a filesystem object is removed. When the server is presented with a
persistent filehandle that refers to a deleted object, it MUST return persistent filehandle that refers to a deleted object, it MUST return
an error of NFS4ERR_STALE. A filehandle may become stale when the an error of NFS4ERR_STALE. A filehandle may become stale when the
filesystem containing the object is no longer available. The file filesystem containing the object is no longer available. The file
system may become unavailable if it exists on removable media and the system may become unavailable if it exists on removable media and the
media is no longer available at the server or the filesystem in whole media is no longer available at the server or the filesystem in whole
has been destroyed or the filesystem has simply been removed from the has been destroyed or the filesystem has simply been removed from the
server's name space (i.e. unmounted in a UNIX environment). server's name space (i.e., unmounted in a UNIX environment).
4.2.3. Volatile Filehandle 4.2.3. Volatile Filehandle
A volatile filehandle does not share the same longevity A volatile filehandle does not share the same longevity
characteristics of a persistent filehandle. The server may determine characteristics of a persistent filehandle. The server may determine
that a volatile filehandle is no longer valid at many different that a volatile filehandle is no longer valid at many different
points in time. If the server can definitively determine that a points in time. If the server can definitively determine that a
volatile filehandle refers to an object that has been removed, the volatile filehandle refers to an object that has been removed, the
server should return NFS4ERR_STALE to the client (as is the case for server should return NFS4ERR_STALE to the client (as is the case for
persistent filehandles). In all other cases where the server persistent filehandles). In all other cases where the server
determines that a volatile filehandle can no longer be used, it determines that a volatile filehandle can no longer be used, it
should return an error of NFS4ERR_FHEXPIRED. should return an error of NFS4ERR_FHEXPIRED.
The mandatory attribute "fh_expire_type" is used by the client to The mandatory attribute "fh_expire_type" is used by the client to
determine what type of filehandle the server is providing for a determine what type of filehandle the server is providing for a
particular filesystem. This attribute is a bitmask with the particular filesystem. This attribute is a bitmask with the
following values: following values:
Draft Specification NFS version 4 Protocol November 2002
FH4_PERSISTENT FH4_PERSISTENT
The value of FH4_PERSISTENT is used to indicate a persistent The value of FH4_PERSISTENT is used to indicate a
filehandle, which is valid until the object is removed from the persistent filehandle, which is valid until the object is
filesystem. The server will not return NFS4ERR_FHEXPIRED for removed from the filesystem. The server will not return
this filehandle. FH4_PERSISTENT is defined as a value in which NFS4ERR_FHEXPIRED for this filehandle. FH4_PERSISTENT is
none of the bits specified below are set. defined as a value in which none of the bits specified
below are set.
FH4_VOLATILE_ANY FH4_VOLATILE_ANY
The filehandle may expire at any time, except as specifically The filehandle may expire at any time, except as
excluded (i.e. FH4_NO_EXPIRE_WITH_OPEN). specifically excluded (i.e., FH4_NO_EXPIRE_WITH_OPEN).
FH4_NOEXPIRE_WITH_OPEN FH4_NOEXPIRE_WITH_OPEN
May only be set when FH4_VOLATILE_ANY is set. If this bit is May only be set when FH4_VOLATILE_ANY is set. If this bit
set, then the meaning of FH4_VOLATILE_ANY is qualified to is set, then the meaning of FH4_VOLATILE_ANY is qualified
exclude any expiration of the filehandle when it is open. to exclude any expiration of the filehandle when it is
open.
FH4_VOL_MIGRATION FH4_VOL_MIGRATION
The filehandle will expire as a result of migration. If The filehandle will expire as a result of migration. If
FH4_VOL_ANY is set, FH4_VOL_MIGRATION is redundant. FH4_VOL_ANY is set, FH4_VOL_MIGRATION is redundant.
FH4_VOL_RENAME FH4_VOL_RENAME
The filehandle will expire during rename. This includes a The filehandle will expire during rename. This includes a
rename by the requesting client or a rename by any other client. rename by the requesting client or a rename by any other
If FH4_VOL_ANY is set, FH4_VOL_RENAME is redundant. client. If FH4_VOL_ANY is set, FH4_VOL_RENAME is
redundant.
Servers which provide volatile filehandles that may expire while Servers which provide volatile filehandles that may expire while open
open (i.e. if FH4_VOL_MIGRATION or FH4_VOL_RENAME is set or if (i.e., if FH4_VOL_MIGRATION or FH4_VOL_RENAME is set or if
FH4_VOLATILE_ANY is set and FH4_NOEXPIRE_WITH_OPEN not set), FH4_VOLATILE_ANY is set and FH4_NOEXPIRE_WITH_OPEN not set), should
should deny a RENAME or REMOVE that would affect an OPEN file of deny a RENAME or REMOVE that would affect an OPEN file of any of the
any of the components leading to the OPEN file. In addition, components leading to the OPEN file. In addition, the server should
the server should deny all RENAME or REMOVE requests during the deny all RENAME or REMOVE requests during the grace period upon
grace period upon server restart. server restart.
Note that the bits FH4_VOL_MIGRATION and FH4_VOL_RENAME allow Note that the bits FH4_VOL_MIGRATION and FH4_VOL_RENAME allow the
the client to determine that expiration has occurred whenever a client to determine that expiration has occurred whenever a specific
specific event occurs, without an explicit filehandle expiration event occurs, without an explicit filehandle expiration error from
error from the server. FH4_VOL_ANY does not provide this form the server. FH4_VOL_ANY does not provide this form of information.
of information. In situations where the server will expire many, In situations where the server will expire many, but not all
but not all filehandles upon migration (e.g. all but those that filehandles upon migration (e.g., all but those that are open),
are open), FH4_VOLATILE_ANY (in this case with FH4_VOLATILE_ANY (in this case with FH4_NOEXPIRE_WITH_OPEN) is a
FH4_NOEXPIRE_WITH_OPEN) is a better choice since the client may better choice since the client may not assume that all filehandles
not assume that all filehandles will expire when migration will expire when migration occurs, and it is likely that additional
occurs, and it is likely that additional expirations will occur expirations will occur (as a result of file CLOSE) that are separated
(as a result of file CLOSE) that are separated in time from the in time from the migration event itself.
migration event itself.
4.2.4. One Method of Constructing a Volatile Filehandle 4.2.4. One Method of Constructing a Volatile Filehandle
A volatile filehandle, while opaque to the client could contain: A volatile filehandle, while opaque to the client could contain:
[volatile bit = 1 | server boot time | slot | generation number] [volatile bit = 1 | server boot time | slot | generation number]
Draft Specification NFS version 4 Protocol November 2002
o slot is an index in the server volatile filehandle table o slot is an index in the server volatile filehandle table
o generation number is the generation number for the table o generation number is the generation number for the table
entry/slot entry/slot
When the client presents a volatile filehandle, the server makes the When the client presents a volatile filehandle, the server makes the
following checks, which assume that the check for the volatile bit following checks, which assume that the check for the volatile bit
has passed. If the server boot time is less than the current server has passed. If the server boot time is less than the current server
boot time, return NFS4ERR_FHEXPIRED. If slot is out of range, return boot time, return NFS4ERR_FHEXPIRED. If slot is out of range, return
NFS4ERR_BADHANDLE. If the generation number does not match, return NFS4ERR_BADHANDLE. If the generation number does not match, return
skipping to change at page 34, line 56 skipping to change at page 35, line 37
However, in the case that the client itself is renaming the file and However, in the case that the client itself is renaming the file and
the file is open, it is possible that the client may be able to the file is open, it is possible that the client may be able to
recover. The client can determine the new path name based on the recover. The client can determine the new path name based on the
processing of the rename request. The client can then regenerate the processing of the rename request. The client can then regenerate the
new filehandle based on the new path name. The client could also use new filehandle based on the new path name. The client could also use
the compound operation mechanism to construct a set of operations the compound operation mechanism to construct a set of operations
like: like:
RENAME A B RENAME A B
LOOKUP B LOOKUP B
GETFH GETFH
Note that the COMPOUND procedure does not provide atomicity. This Note that the COMPOUND procedure does not provide atomicity. This
example only reduces the overhead of recovering from an expired example only reduces the overhead of recovering from an expired
Draft Specification NFS version 4 Protocol November 2002
filehandle. filehandle.
Draft Specification NFS version 4 Protocol November 2002
5. File Attributes 5. File Attributes
To meet the requirements of extensibility and increased To meet the requirements of extensibility and increased
interoperability with non-UNIX platforms, attributes must be handled interoperability with non-UNIX platforms, attributes must be handled
in a flexible manner. The NFS version 3 fattr3 structure contains a in a flexible manner. The NFS version 3 fattr3 structure contains a
fixed list of attributes that not all clients and servers are able to fixed list of attributes that not all clients and servers are able to
support or care about. The fattr3 structure can not be extended as support or care about. The fattr3 structure can not be extended as
new needs arise and it provides no way to indicate non-support. With new needs arise and it provides no way to indicate non-support. With
the NFS version 4 protocol, the client is able query what attributes the NFS version 4 protocol, the client is able query what attributes
the server supports and construct requests with only those supported the server supports and construct requests with only those supported
skipping to change at page 37, line 4 skipping to change at page 36, line 43
than by an NFS client implementation. NFS implementors are strongly than by an NFS client implementation. NFS implementors are strongly
encouraged to define their new attributes as recommended attributes encouraged to define their new attributes as recommended attributes
by bringing them to the IETF standards-track process. by bringing them to the IETF standards-track process.
The set of attributes which are classified as mandatory is The set of attributes which are classified as mandatory is
deliberately small since servers must do whatever it takes to support deliberately small since servers must do whatever it takes to support
them. A server should support as many of the recommended attributes them. A server should support as many of the recommended attributes
as possible but by their definition, the server is not required to as possible but by their definition, the server is not required to
support all of them. Attributes are deemed mandatory if the data is support all of them. Attributes are deemed mandatory if the data is
both needed by a large number of clients and is not otherwise both needed by a large number of clients and is not otherwise
Draft Specification NFS version 4 Protocol November 2002
reasonably computable by the client when support is not provided on reasonably computable by the client when support is not provided on
the server. the server.
Note that the hidden directory returned by OPENATTR is a convenience Note that the hidden directory returned by OPENATTR is a convenience
for protocol processing. The client should not make any assumptions for protocol processing. The client should not make any assumptions
about the server's implementation of named attributes and whether the about the server's implementation of named attributes and whether the
underlying filesystem at the server has a named attribute directory underlying filesystem at the server has a named attribute directory
or not. Therefore, operations such as SETATTR and GETATTR on the or not. Therefore, operations such as SETATTR and GETATTR on the
named attribute directory are undefined. named attribute directory are undefined.
skipping to change at page 38, line 4 skipping to change at page 37, line 43
clients but the client is better positioned decide whether and how to clients but the client is better positioned decide whether and how to
fabricate or construct an attribute or whether to do without the fabricate or construct an attribute or whether to do without the
attribute. attribute.
5.3. Named Attributes 5.3. Named Attributes
These attributes are not supported by direct encoding in the NFS These attributes are not supported by direct encoding in the NFS
Version 4 protocol but are accessed by string names rather than Version 4 protocol but are accessed by string names rather than
numbers and correspond to an uninterpreted stream of bytes which are numbers and correspond to an uninterpreted stream of bytes which are
stored with the filesystem object. The name space for these stored with the filesystem object. The name space for these
Draft Specification NFS version 4 Protocol November 2002
attributes may be accessed by using the OPENATTR operation. The attributes may be accessed by using the OPENATTR operation. The
OPENATTR operation returns a filehandle for a virtual "attribute OPENATTR operation returns a filehandle for a virtual "attribute
directory" and further perusal of the name space may be done using directory" and further perusal of the name space may be done using
READDIR and LOOKUP operations on this filehandle. Named attributes READDIR and LOOKUP operations on this filehandle. Named attributes
may then be examined or changed by normal READ and WRITE and CREATE may then be examined or changed by normal READ and WRITE and CREATE
operations on the filehandles returned from READDIR and LOOKUP. operations on the filehandles returned from READDIR and LOOKUP.
Named attributes may have attributes. Named attributes may have attributes.
It is recommended that servers support arbitrary named attributes. A It is recommended that servers support arbitrary named attributes. A
client should not depend on the ability to store any named attributes client should not depend on the ability to store any named attributes
skipping to change at page 38, line 49 skipping to change at page 38, line 39
o The per server attribute is: o The per server attribute is:
lease_time lease_time
o The per filesystem attributes are: o The per filesystem attributes are:
supp_attr, fh_expire_type, link_support, symlink_support, supp_attr, fh_expire_type, link_support, symlink_support,
unique_handles, aclsupport, cansettime, case_insensitive, unique_handles, aclsupport, cansettime, case_insensitive,
case_preserving, chown_restricted, files_avail, files_free, case_preserving, chown_restricted, files_avail, files_free,
files_total, fs_locations, homogeneous, maxfilesize, maxname, files_total, fs_locations, homogeneous, maxfilesize, maxname,
maxread, maxwrite, no_trunc, space_avail, space_free, maxread, maxwrite, no_trunc, space_avail, space_free, space_total,
space_total, time_delta time_delta
o The per filesystem object attributes are: o The per filesystem object attributes are:
type, change, size, named_attr, fsid, rdattr_error, filehandle, type, change, size, named_attr, fsid, rdattr_error, filehandle,
ACL, archive, fileid, hidden, maxlink, mimetype, mode, numlinks, ACL, archive, fileid, hidden, maxlink, mimetype, mode, numlinks,
owner, owner_group, rawdev, space_used, system, time_access, owner, owner_group, rawdev, space_used, system, time_access,
time_backup, time_create, time_metadata, time_modify, time_backup, time_create, time_metadata, time_modify,
Draft Specification NFS version 4 Protocol November 2002
mounted_on_fileid mounted_on_fileid
For quota_avail_hard, quota_avail_soft, and quota_used see their For quota_avail_hard, quota_avail_soft, and quota_used see their
definitions below for the appropriate classification. definitions below for the appropriate classification.
Draft Specification NFS version 4 Protocol November 2002
5.5. Mandatory Attributes - Definitions 5.5. Mandatory Attributes - Definitions
Name # DataType Access Description Name # DataType Access Description
___________________________________________________________________ ___________________________________________________________________
supp_attr 0 bitmap READ The bit vector which supp_attr 0 bitmap READ The bit vector which
would retrieve all would retrieve all
mandatory and mandatory and
recommended attributes recommended attributes
that are supported for that are supported for
this object. The this object. The
skipping to change at page 41, line 5 skipping to change at page 40, line 5
only if the filesystem only if the filesystem
object can not be object can not be
updated more updated more
frequently than the frequently than the
resolution of resolution of
time_metadata. time_metadata.
size 4 uint64 R/W The size of the object size 4 uint64 R/W The size of the object
in bytes. in bytes.
Draft Specification NFS version 4 Protocol November 2002
link_support 5 bool READ True, if the object's link_support 5 bool READ True, if the object's
filesystem supports filesystem supports
hard links. hard links.
symlink_support 6 bool READ True, if the object's symlink_support 6 bool READ True, if the object's
filesystem supports filesystem supports
symbolic links. symbolic links.
named_attr 7 bool READ True, if this object named_attr 7 bool READ True, if this object
has named attributes. has named attributes.
skipping to change at page 42, line 5 skipping to change at page 41, line 5
server in seconds. server in seconds.
rdattr_error 11 enum READ Error returned from rdattr_error 11 enum READ Error returned from
getattr during getattr during
readdir. readdir.
filehandle 19 nfs_fh4 READ The filehandle of this filehandle 19 nfs_fh4 READ The filehandle of this
object (primarily for object (primarily for
readdir requests). readdir requests).
Draft Specification NFS version 4 Protocol November 2002
5.6. Recommended Attributes - Definitions 5.6. Recommended Attributes - Definitions
Name # Data Type Access Description Name # Data Type Access Description
______________________________________________________________________ _____________________________________________________________________
ACL 12 nfsace4<> R/W The access control ACL 12 nfsace4<> R/W The access control
list for the object. list for the object.
aclsupport 13 uint32 READ Indicates what types aclsupport 13 uint32 READ Indicates what types
of ACLs are supported of ACLs are
on the current supported on the
filesystem. current filesystem.
archive 14 bool R/W True, if this file archive 14 bool R/W True, if this file
has been archived has been archived
since the time of since the time of
last modification last modification
(deprecated in favor (deprecated in favor
of time_backup). of time_backup).
cansettime 15 bool READ True, if the server cansettime 15 bool READ True, if the server
able to change the is able to change
times for a the times for a
filesystem object as filesystem object as
specified in a specified in a
SETATTR operation. SETATTR operation.
case_insensitive 16 bool READ True, if filename case_insensitive 16 bool READ True, if filename
comparisons on this comparisons on this
filesystem are case filesystem are case
insensitive. insensitive.
case_preserving 17 bool READ True, if filename case_preserving 17 bool READ True, if filename
skipping to change at page 43, line 5 skipping to change at page 42, line 7
with a file if the with a file if the
caller is not a caller is not a
privileged user (for privileged user (for
example, "root" in example, "root" in
UNIX operating UNIX operating
environments or in environments or in
Windows 2000 the Windows 2000 the
"Take Ownership" "Take Ownership"
privilege). privilege).
Draft Specification NFS version 4 Protocol November 2002
fileid 20 uint64 READ A number uniquely fileid 20 uint64 READ A number uniquely
identifying the file identifying the file
within the within the
filesystem. filesystem.
files_avail 21 uint64 READ File slots available files_avail 21 uint64 READ File slots available
to this user on the to this user on the
filesystem containing filesystem
this object - this containing this
should be the object - this should
smallest relevant be the smallest
limit. relevant limit.
files_free 22 uint64 READ Free file slots on files_free 22 uint64 READ Free file slots on
the filesystem the filesystem
containing this containing this
object - this should object - this should
be the smallest be the smallest
relevant limit. relevant limit.
files_total 23 uint64 READ Total file slots on files_total 23 uint64 READ Total file slots on
the filesystem the filesystem
containing this containing this
object. object.
fs_locations 24 fs_locations READ Locations where this fs_locations 24 fs_locations READ Locations where this
filesystem may be filesystem may be
found. If the server found. If the
returns NFS4ERR_MOVED server returns
NFS4ERR_MOVED
as an error, this as an error, this
attribute MUST be attribute MUST be
supported. supported.
hidden 25 bool R/W True, if the file is hidden 25 bool R/W True, if the file is
considered hidden considered hidden
with respect to the with respect to the
Windows API? Windows API.
homogeneous 26 bool READ True, if this homogeneous 26 bool READ True, if this
object's filesystem object's filesystem
is homogeneous, i.e. is homogeneous,
are per filesystem i.e., are per
filesystem
attributes the same attributes the same
for all filesystem's for all filesystem's
objects. objects?
maxfilesize 27 uint64 READ Maximum supported maxfilesize 27 uint64 READ Maximum supported
file size for the file size for the
filesystem of this filesystem of this
object. object.
Draft Specification NFS version 4 Protocol November 2002
maxlink 28 uint32 READ Maximum number of maxlink 28 uint32 READ Maximum number of
links for this links for this
object. object.
maxname 29 uint32 READ Maximum filename size maxname 29 uint32 READ Maximum filename
supported for this size supported for
object. this object.
maxread 30 uint64 READ Maximum read size maxread 30 uint64 READ Maximum read size
supported for this supported for this
object. object.
maxwrite 31 uint64 READ Maximum write size maxwrite 31 uint64 READ Maximum write size
supported for this supported for this
object. This object. This
attribute SHOULD be attribute SHOULD be
supported if the file supported if the
is writable. Lack of file is writable.
this attribute can Lack of this
attribute can
lead to the client lead to the client
either wasting either wasting
bandwidth or not bandwidth or not
receiving the best receiving the best
performance. performance.
mimetype 32 utf8<> R/W MIME body mimetype 32 utf8<> R/W MIME body
type/subtype of this type/subtype of this
object. object.
skipping to change at page 45, line 5 skipping to change at page 44, line 16
to this object. to this object.
owner 36 utf8<> R/W The string name of owner 36 utf8<> R/W The string name of
the owner of this the owner of this
object. object.
owner_group 37 utf8<> R/W The string name of owner_group 37 utf8<> R/W The string name of
the group ownership the group ownership
of this object. of this object.
Draft Specification NFS version 4 Protocol November 2002
quota_avail_hard 38 uint64 READ For definition see quota_avail_hard 38 uint64 READ For definition see
"Quota Attributes" "Quota Attributes"
section below. section below.
quota_avail_soft 39 uint64 READ For definition see quota_avail_soft 39 uint64 READ For definition see
"Quota Attributes" "Quota Attributes"
section below. section below.
quota_used 40 uint64 READ For definition see quota_used 40 uint64 READ For definition see
"Quota Attributes" "Quota Attributes"
section below. section below.
rawdev 41 specdata4 READ Raw device rawdev 41 specdata4 READ Raw device
identifier. UNIX identifier. UNIX
device major/minor device major/minor
node information. If node information.
the value of type is If the value of
not NF4BLK or NF4CHR, type is not
NF4BLK or NF4CHR,
the value return the value return
SHOULD NOT be SHOULD NOT be
considered useful. considered useful.
space_avail 42 uint64 READ Disk space in bytes space_avail 42 uint64 READ Disk space in bytes
available to this available to this
user on the user on the
filesystem containing filesystem
this object - this containing this
should be the object - this should
smallest relevant be the smallest
limit. relevant limit.
space_free 43 uint64 READ Free disk space in space_free 43 uint64 READ Free disk space in
bytes on the bytes on the
filesystem containing filesystem
this object - this containing this
should be the object - this should
smallest relevant be the smallest
limit. relevant limit.
space_total 44 uint64 READ Total disk space in space_total 44 uint64 READ Total disk space in
bytes on the bytes on the
filesystem containing filesystem
this object. containing this
object.
space_used 45 uint64 READ Number of filesystem space_used 45 uint64 READ Number of filesystem
bytes allocated to bytes allocated to
this object. this object.
Draft Specification NFS version 4 Protocol November 2002 system 46 bool R/W True, if this file
is a "system" file
system 46 bool R/W True, if this file is with respect to the
a "system" file with Windows API.
respect to the
Windows API?
time_access 47 nfstime4 READ The time of last time_access 47 nfstime4 READ The time of last
access to the object access to the object
by a read that was by a read that was
satisfied by the satisfied by the
server. server.
time_access_set 48 settime4 WRITE Set the time of last time_access_set 48 settime4 WRITE Set the time of last
access to the object. access to the
SETATTR use only. object. SETATTR
use only.
time_backup 49 nfstime4 R/W The time of last time_backup 49 nfstime4 R/W The time of last
backup of the object. backup of the
object.
time_create 50 nfstime4 R/W The time of creation time_create 50 nfstime4 R/W The time of creation
of the object. This of the object. This
attribute does not attribute does not
have any relation to have any relation to
the traditional UNIX the traditional UNIX
file attribute file attribute
"ctime" or "change "ctime" or "change
time". time".
skipping to change at page 46, line 53 skipping to change at page 46, line 20
time_modify 53 nfstime4 READ The time of last time_modify 53 nfstime4 READ The time of last
modification to the modification to the
object. object.
time_modify_set 54 settime4 WRITE Set the time of last time_modify_set 54 settime4 WRITE Set the time of last
modification to the modification to the
object. SETATTR use object. SETATTR use
only. only.
mounted_on_fileid 55 uint64 READ Like fileid, but if mounted_on_fileid 55 uint64 READ Like fileid, but if
the target filehandle the target
is the root of a filehandle is the
filesystem return the root of a filesystem
fileid of the return the fileid of
underlying directory. the underlying
directory.
Draft Specification NFS version 4 Protocol November 2002
5.7. Time Access 5.7. Time Access
As defined above, the time_access attribute represents the time of As defined above, the time_access attribute represents the time of
last access to the object by a read that was satisfied by the server. last access to the object by a read that was satisfied by the server.
The notion of what is an "access" depends on server's operating The notion of what is an "access" depends on server's operating
environment and/or the server's filesystem semantics. For example, environment and/or the server's filesystem semantics. For example,
for servers obeying POSIX semantics, time_access would be updated for servers obeying POSIX semantics, time_access would be updated
only by the READLINK, READ, and READDIR operations and not any of the only by the READLINK, READ, and READDIR operations and not any of the
operations that modify the content of the object. Of course, setting operations that modify the content of the object. Of course, setting
skipping to change at page 48, line 4 skipping to change at page 47, line 35
storage, to serve as a means of identifying the users corresponding storage, to serve as a means of identifying the users corresponding
to these security principals. When these local identifiers are to these security principals. When these local identifiers are
translated to the form of the owner attribute, associated with files translated to the form of the owner attribute, associated with files
created by such principals they identify, in a common format, the created by such principals they identify, in a common format, the
users associated with each corresponding set of security principals. users associated with each corresponding set of security principals.
The translation used to interpret owner and group strings is not The translation used to interpret owner and group strings is not
specified as part of the protocol. This allows various solutions to specified as part of the protocol. This allows various solutions to
be employed. For example, a local translation table may be consulted be employed. For example, a local translation table may be consulted
that maps between a numeric id to the user@dns_domain syntax. A name that maps between a numeric id to the user@dns_domain syntax. A name
Draft Specification NFS version 4 Protocol November 2002
service may also be used to accomplish the translation. A server may service may also be used to accomplish the translation. A server may
provide a more general service, not limited by any particular provide a more general service, not limited by any particular
translation (which would only translate a limited set of possible translation (which would only translate a limited set of possible
strings) by storing the owner and owner_group attributes in local strings) by storing the owner and owner_group attributes in local
storage without any translation or it may augment a translation storage without any translation or it may augment a translation
method by storing the entire string for attributes for which no method by storing the entire string for attributes for which no
translation is available while using the local representation for translation is available while using the local representation for
those cases in which a translation is available. those cases in which a translation is available.
Servers that do not provide support for all possible values of the Servers that do not provide support for all possible values of the
skipping to change at page 48, line 49 skipping to change at page 48, line 28
server, the attribute value must be constructed without the "@". server, the attribute value must be constructed without the "@".
Therefore, the absence of the @ from the owner or owner_group Therefore, the absence of the @ from the owner or owner_group
attribute signifies that no translation was available at the sender attribute signifies that no translation was available at the sender
and that the receiver of the attribute should not use that string as and that the receiver of the attribute should not use that string as
a basis for translation into its own internal format. Even though a basis for translation into its own internal format. Even though
the attribute value can not be translated, it may still be useful. the attribute value can not be translated, it may still be useful.
In the case of a client, the attribute string may be used for local In the case of a client, the attribute string may be used for local
display of ownership. display of ownership.
To provide a greater degree of compatibility with previous versions To provide a greater degree of compatibility with previous versions
of NFS (i.e. v2 and v3), which identified users and groups by 32-bit of NFS (i.e., v2 and v3), which identified users and groups by 32-bit
unsigned uid's and gid's, owner and group strings that consist of unsigned uid's and gid's, owner and group strings that consist of
decimal numeric values with no leading zeros can be given a special decimal numeric values with no leading zeros can be given a special
interpretation by clients and servers which choose to provide such interpretation by clients and servers which choose to provide such
support. The receiver may treat such a user or group string as support. The receiver may treat such a user or group string as
representing the same user as would be represented by a v2/v3 uid or representing the same user as would be represented by a v2/v3 uid or
gid having the corresponding numeric value. A server is not gid having the corresponding numeric value. A server is not
obligated to accept such a string, but may return an NFS4ERR_BADOWNER obligated to accept such a string, but may return an NFS4ERR_BADOWNER
instead. To avoid this mechanism being used to subvert user and instead. To avoid this mechanism being used to subvert user and
group translation, so that a client might pass all of the owners and group translation, so that a client might pass all of the owners and
Draft Specification NFS version 4 Protocol November 2002
groups in numeric form, a server SHOULD return an NFS4ERR_BADOWNER groups in numeric form, a server SHOULD return an NFS4ERR_BADOWNER
error when there is a valid translation for the user or owner error when there is a valid translation for the user or owner
designated in this way. In that case, the client must use the designated in this way. In that case, the client must use the
appropriate name@domain string and not the special form for appropriate name@domain string and not the special form for
compatibility. compatibility.
The owner string "nobody" may be used to designate an anonymous user, The owner string "nobody" may be used to designate an anonymous user,
which will be associated with a file created by a security principal which will be associated with a file created by a security principal
that cannot be mapped through normal means to the owner attribute. that cannot be mapped through normal means to the owner attribute.
skipping to change at page 49, line 36 skipping to change at page 49, line 24
section "Internationalization". section "Internationalization".
5.10. Quota Attributes 5.10. Quota Attributes
For the attributes related to filesystem quotas, the following For the attributes related to filesystem quotas, the following
definitions apply: definitions apply:
quota_avail_soft quota_avail_soft
The value in bytes which represents the amount of additional The value in bytes which represents the amount of additional
disk space that can be allocated to this file or directory disk space that can be allocated to this file or directory
before the user may reasonably be warned. It is understood that before the user may reasonably be warned. It is understood
this space may be consumed by allocations to other files or that this space may be consumed by allocations to other files
directories though there is a rule as to which other files or or directories though there is a rule as to which other files
directories. or directories.
quota_avail_hard quota_avail_hard
The value in bytes which represent the amount of additional disk The value in bytes which represent the amount of additional
space beyond the current allocation that can be allocated to disk space beyond the current allocation that can be allocated
this file or directory before further allocations will be to this file or directory before further allocations will be
refused. It is understood that this space may be consumed by refused. It is understood that this space may be consumed by
allocations to other files or directories. allocations to other files or directories.
quota_used quota_used
The value in bytes which represent the amount of disc space used The value in bytes which represent the amount of disc space
by this file or directory and possibly a number of other similar used by this file or directory and possibly a number of other
files or directories, where the set of "similar" meets at least similar files or directories, where the set of "similar" meets
the criterion that allocating space to any file or directory in at least the criterion that allocating space to any file or
the set will reduce the "quota_avail_hard" of every other file directory in the set will reduce the "quota_avail_hard" of
or directory in the set. every other file or directory in the set.
Draft Specification NFS version 4 Protocol November 2002
Note that there may be a number of distinct but overlapping sets Note that there may be a number of distinct but overlapping
of files or directories for which a quota_used value is sets of files or directories for which a quota_used value is
maintained. E.g. "all files with a given owner", "all files with maintained (e.g., "all files with a given owner", "all files
a given group owner". etc. with a given group owner", etc.).
The server is at liberty to choose any of those sets but should The server is at liberty to choose any of those sets but should
do so in a repeatable way. The rule may be configured per- do so in a repeatable way. The rule may be configured per-
filesystem or may be "choose the set with the smallest quota". filesystem or may be "choose the set with the smallest quota".
5.11. Access Control Lists 5.11. Access Control Lists
The NFS version 4 ACL attribute is an array of access control entries The NFS version 4 ACL attribute is an array of access control entries
(ACE). Although, the client can read and write the ACL attribute, (ACE). Although, the client can read and write the ACL attribute,
the NFSv4 model is the server does all access control based on the the NFSv4 model is the server does all access control based on the
skipping to change at page 51, line 5 skipping to change at page 50, line 46
the requester are considered. Each ACE is processed until all of the the requester are considered. Each ACE is processed until all of the
bits of the requester's access have been ALLOWED. Once a bit (see bits of the requester's access have been ALLOWED. Once a bit (see
below) has been ALLOWED by an ACCESS_ALLOWED_ACE, it is no longer below) has been ALLOWED by an ACCESS_ALLOWED_ACE, it is no longer
considered in the processing of later ACEs. If an ACCESS_DENIED_ACE considered in the processing of later ACEs. If an ACCESS_DENIED_ACE
is encountered where the requester's access still has unALLOWED bits is encountered where the requester's access still has unALLOWED bits
in common with the "access_mask" of the ACE, the request is denied. in common with the "access_mask" of the ACE, the request is denied.
However, unlike the ALLOWED and DENIED ACE types, the ALARM and AUDIT However, unlike the ALLOWED and DENIED ACE types, the ALARM and AUDIT
ACE types do not affect a requester's access, and instead are for ACE types do not affect a requester's access, and instead are for
triggering events as a result of a requester's access attempt. triggering events as a result of a requester's access attempt.
Draft Specification NFS version 4 Protocol November 2002
Therefore, all AUDIT and ALARM ACEs are processed until end of the Therefore, all AUDIT and ALARM ACEs are processed until end of the
ACL. When the ACL is fully processed, if there are bits in ACL. When the ACL is fully processed, if there are bits in
requester's mask that have not been considered whether the server requester's mask that have not been considered whether the server
allows or denies the access is undefined. If there is a mode allows or denies the access is undefined. If there is a mode
attribute on the file, then this cannot happen, since the mode's attribute on the file, then this cannot happen, since the mode's
MODE4_*OTH bits will map to EVERYONE@ ACEs that unambiguously specify MODE4_*OTH bits will map to EVERYONE@ ACEs that unambiguously specify
the requester's access. the requester's access.
The NFS version 4 ACL model is quite rich. Some server platforms may The NFS version 4 ACL model is quite rich. Some server platforms may
provide access control functionality that goes beyond the UNIX-style provide access control functionality that goes beyond the UNIX-style
skipping to change at page 52, line 5 skipping to change at page 51, line 47
in acemask4. in acemask4.
ALARM Generate a system ALARM (system ALARM Generate a system ALARM (system
dependent) when any access attempt is dependent) when any access attempt is
made to a file or directory for the made to a file or directory for the
access methods specified in acemask4. access methods specified in acemask4.
A server need not support all of the above ACE types. The bitmask A server need not support all of the above ACE types. The bitmask
constants used to represent the above definitions within the constants used to represent the above definitions within the
Draft Specification NFS version 4 Protocol November 2002
aclsupport attribute are as follows: aclsupport attribute are as follows:
const ACL4_SUPPORT_ALLOW_ACL = 0x00000001; const ACL4_SUPPORT_ALLOW_ACL = 0x00000001;
const ACL4_SUPPORT_DENY_ACL = 0x00000002; const ACL4_SUPPORT_DENY_ACL = 0x00000002;
const ACL4_SUPPORT_AUDIT_ACL = 0x00000004; const ACL4_SUPPORT_AUDIT_ACL = 0x00000004;
const ACL4_SUPPORT_ALARM_ACL = 0x00000008; const ACL4_SUPPORT_ALARM_ACL = 0x00000008;
The semantics of the "type" field follow the descriptions provided The semantics of the "type" field follow the descriptions provided
above. above.
The constants used for the type field (acetype4) are as follows: The constants used for the type field (acetype4) are as follows:
const ACE4_ACCESS_ALLOWED_ACE_TYPE = 0x00000000; const ACE4_ACCESS_ALLOWED_ACE_TYPE = 0x00000000;
const ACE4_ACCESS_DENIED_ACE_TYPE = 0x00000001; const ACE4_ACCESS_DENIED_ACE_TYPE = 0x00000001;
const ACE4_SYSTEM_AUDIT_ACE_TYPE = 0x00000002; const ACE4_SYSTEM_AUDIT_ACE_TYPE = 0x00000002;
const ACE4_SYSTEM_ALARM_ACE_TYPE = 0x00000003; const ACE4_SYSTEM_ALARM_ACE_TYPE = 0x00000003;
skipping to change at page 53, line 4 skipping to change at page 52, line 42
_______________________________________________________________ _______________________________________________________________
READ_DATA Permission to read the data of the file READ_DATA Permission to read the data of the file
LIST_DIRECTORY Permission to list the contents of a LIST_DIRECTORY Permission to list the contents of a
directory directory
WRITE_DATA Permission to modify the file's data WRITE_DATA Permission to modify the file's data
ADD_FILE Permission to add a new file to a ADD_FILE Permission to add a new file to a
directory directory
APPEND_DATA Permission to append data to a file APPEND_DATA Permission to append data to a file
ADD_SUBDIRECTORY Permission to create a subdirectory to a ADD_SUBDIRECTORY Permission to create a subdirectory to a
directory directory
Draft Specification NFS version 4 Protocol November 2002
READ_NAMED_ATTRS Permission to read the named attributes READ_NAMED_ATTRS Permission to read the named attributes
of a file of a file
WRITE_NAMED_ATTRS Permission to write the named attributes WRITE_NAMED_ATTRS Permission to write the named attributes
of a file of a file
EXECUTE Permission to execute a file EXECUTE Permission to execute a file
DELETE_CHILD Permission to delete a file or directory DELETE_CHILD Permission to delete a file or directory
within a directory within a directory
READ_ATTRIBUTES The ability to read basic attributes READ_ATTRIBUTES The ability to read basic attributes
(non-acls) of a file (non-acls) of a file
WRITE_ATTRIBUTES Permission to change basic attributes WRITE_ATTRIBUTES Permission to change basic attributes
skipping to change at page 53, line 49 skipping to change at page 53, line 36
const ACE4_DELETE = 0x00010000; const ACE4_DELETE = 0x00010000;
const ACE4_READ_ACL = 0x00020000; const ACE4_READ_ACL = 0x00020000;
const ACE4_WRITE_ACL = 0x00040000; const ACE4_WRITE_ACL = 0x00040000;
const ACE4_WRITE_OWNER = 0x00080000; const ACE4_WRITE_OWNER = 0x00080000;
const ACE4_SYNCHRONIZE = 0x00100000; const ACE4_SYNCHRONIZE = 0x00100000;
Server implementations need not provide the granularity of control Server implementations need not provide the granularity of control
that is implied by this list of masks. For example, POSIX-based that is implied by this list of masks. For example, POSIX-based
systems might not distinguish APPEND_DATA (the ability to append to a systems might not distinguish APPEND_DATA (the ability to append to a
file) from WRITE_DATA (the ability to modify existing contents); both file) from WRITE_DATA (the ability to modify existing contents); both
masks would be tied to a single ``write'' permission. When such a masks would be tied to a single "write" permission. When such a
server returns attributes to the client, it would show both server returns attributes to the client, it would show both
APPEND_DATA and WRITE_DATA if and only if the write permission is APPEND_DATA and WRITE_DATA if and only if the write permission is
enabled. enabled.
If a server receives a SETATTR request that it cannot accurately If a server receives a SETATTR request that it cannot accurately
implement, it should error in the direction of more restricted implement, it should error in the direction of more restricted
access. For example, suppose a server cannot distinguish overwriting access. For example, suppose a server cannot distinguish overwriting
Draft Specification NFS version 4 Protocol November 2002
data from appending new data, as described in the previous paragraph. data from appending new data, as described in the previous paragraph.
If a client submits an ACE where APPEND_DATA is set but WRITE_DATA is If a client submits an ACE where APPEND_DATA is set but WRITE_DATA is
not (or vice versa), the server should reject the request with not (or vice versa), the server should reject the request with
NFS4ERR_ATTRNOTSUPP. Nonetheless, if the ACE has type DENY, the NFS4ERR_ATTRNOTSUPP. Nonetheless, if the ACE has type DENY, the
server may silently turn on the other bit, so that both APPEND_DATA server may silently turn on the other bit, so that both APPEND_DATA
and WRITE_DATA are denied. and WRITE_DATA are denied.
5.11.3. ACE flag 5.11.3. ACE flag
The "flag" field contains values based on the following descriptions. The "flag" field contains values based on the following descriptions.
skipping to change at page 54, line 19 skipping to change at page 54, line 10
not (or vice versa), the server should reject the request with not (or vice versa), the server should reject the request with
NFS4ERR_ATTRNOTSUPP. Nonetheless, if the ACE has type DENY, the NFS4ERR_ATTRNOTSUPP. Nonetheless, if the ACE has type DENY, the
server may silently turn on the other bit, so that both APPEND_DATA server may silently turn on the other bit, so that both APPEND_DATA
and WRITE_DATA are denied. and WRITE_DATA are denied.
5.11.3. ACE flag 5.11.3. ACE flag
The "flag" field contains values based on the following descriptions. The "flag" field contains values based on the following descriptions.
ACE4_FILE_INHERIT_ACE ACE4_FILE_INHERIT_ACE
Can be placed on a directory and indicates that this ACE should be Can be placed on a directory and indicates that this ACE should be
added to each new non-directory file created. added to each new non-directory file created.
ACE4_DIRECTORY_INHERIT_ACE ACE4_DIRECTORY_INHERIT_ACE
Can be placed on a directory and indicates that this ACE should be Can be placed on a directory and indicates that this ACE should be
added to each new directory created. added to each new directory created.
ACE4_INHERIT_ONLY_ACE ACE4_INHERIT_ONLY_ACE
Can be placed on a directory but does not apply to the directory, Can be placed on a directory but does not apply to the directory,
only to newly created files/directories as specified by the above two only to newly created files/directories as specified by the above
flags. two flags.
ACE4_NO_PROPAGATE_INHERIT_ACE ACE4_NO_PROPAGATE_INHERIT_ACE
Can be placed on a directory. Normally when a new directory is Can be placed on a directory. Normally when a new directory is
created and an ACE exists on the parent directory which is marked created and an ACE exists on the parent directory which is marked
ACL4_DIRECTORY_INHERIT_ACE, two ACEs are placed on the new directory. ACL4_DIRECTORY_INHERIT_ACE, two ACEs are placed on the new
One for the directory itself and one which is an inheritable ACE for directory. One for the directory itself and one which is an
newly created directories. This flag tells the server to not place inheritable ACE for newly created directories. This flag tells
an ACE on the newly created directory which is inheritable by the server to not place an ACE on the newly created directory
subdirectories of the created directory. which is inheritable by subdirectories of the created directory.
ACE4_SUCCESSFUL_ACCESS_ACE_FLAG ACE4_SUCCESSFUL_ACCESS_ACE_FLAG
ACL4_FAILED_ACCESS_ACE_FLAG ACL4_FAILED_ACCESS_ACE_FLAG
The ACE4_SUCCESSFUL_ACCESS_ACE_FLAG (SUCCESS) and The ACE4_SUCCESSFUL_ACCESS_ACE_FLAG (SUCCESS) and
ACE4_FAILED_ACCESS_ACE_FLAG (FAILED) flag bits relate only to ACE4_FAILED_ACCESS_ACE_FLAG (FAILED) flag bits relate only to
ACE4_SYSTEM_AUDIT_ACE_TYPE (AUDIT) and ACE4_SYSTEM_ALARM_ACE_TYPE ACE4_SYSTEM_AUDIT_ACE_TYPE (AUDIT) and ACE4_SYSTEM_ALARM_ACE_TYPE
(ALARM) ACE types. If during the processing of the file's ACL, the (ALARM) ACE types. If during the processing of the file's ACL,
server encounters an AUDIT or ALARM ACE that matches the principal the server encounters an AUDIT or ALARM ACE that matches the
principal attempting the OPEN, the server notes that fact, and the
Draft Specification NFS version 4 Protocol November 2002 presence, if any, of the SUCCESS and FAILED flags encountered in
the AUDIT or ALARM ACE. Once the server completes the ACL
attempting the OPEN, the server notes that fact, and the presence, if processing, and the share reservation processing, and the OPEN
any, of the SUCCESS and FAILED flags encountered in the AUDIT or call, it then notes if the OPEN succeeded or failed. If the OPEN
ALARM ACE. Once the server completes the ACL processing, and the succeeded, and if the SUCCESS flag was set for a matching AUDIT or
share reservation processing, and the OPEN call, it then notes if the ALARM, then the appropriate AUDIT or ALARM event occurs. If the
OPEN succeeded or failed. If the OPEN succeeded, and if the SUCCESS OPEN failed, and if the FAILED flag was set for the matching AUDIT
flag was set for a matching AUDIT or ALARM, then the appropriate or ALARM, then the appropriate AUDIT or ALARM event occurs.
AUDIT or ALARM event occurs. If the OPEN failed, and if the FAILED Clearly either or both of the SUCCESS or FAILED can be set, but if
flag was set for the matching AUDIT or ALARM, then the appropriate neither is set, the AUDIT or ALARM ACE is not useful.
AUDIT or ALARM event occurs. Clearly either or both of the SUCCESS
or FAILED can be set, but if neither is set, the AUDIT or ALARM ACE
is not useful.
The previously described processing applies to that of the ACCESS The previously described processing applies to that of the ACCESS
operation as well. The difference being that "success" or "failure" operation as well. The difference being that "success" or
does not mean whether ACCESS returns NFS4_OK or not. Success means "failure" does not mean whether ACCESS returns NFS4_OK or not.
whether ACCESS returns all requested and supported bits. Failure Success means whether ACCESS returns all requested and supported
means whether ACCESS failed to return a bit that was requested and bits. Failure means whether ACCESS failed to return a bit that
supported. was requested and supported.
ACE4_IDENTIFIER_GROUP ACE4_IDENTIFIER_GROUP
Indicates that the "who" refers to a GROUP as defined under UNIX. Indicates that the "who" refers to a GROUP as defined under UNIX.
The bitmask constants used for the flag field are as follows: The bitmask constants used for the flag field are as follows:
const ACE4_FILE_INHERIT_ACE = 0x00000001; const ACE4_FILE_INHERIT_ACE = 0x00000001;
const ACE4_DIRECTORY_INHERIT_ACE = 0x00000002; const ACE4_DIRECTORY_INHERIT_ACE = 0x00000002;
const ACE4_NO_PROPAGATE_INHERIT_ACE = 0x00000004; const ACE4_NO_PROPAGATE_INHERIT_ACE = 0x00000004;
const ACE4_INHERIT_ONLY_ACE = 0x00000008; const ACE4_INHERIT_ONLY_ACE = 0x00000008;
const ACE4_SUCCESSFUL_ACCESS_ACE_FLAG = 0x00000010; const ACE4_SUCCESSFUL_ACCESS_ACE_FLAG = 0x00000010;
const ACE4_FAILED_ACCESS_ACE_FLAG = 0x00000020; const ACE4_FAILED_ACCESS_ACE_FLAG = 0x00000020;
skipping to change at page 56, line 5 skipping to change at page 55, line 42
For example, suppose a client tries to set an ACE with For example, suppose a client tries to set an ACE with
ACE4_FILE_INHERIT_ACE set but not ACE4_DIRECTORY_INHERIT_ACE. If the ACE4_FILE_INHERIT_ACE set but not ACE4_DIRECTORY_INHERIT_ACE. If the
server does not support any form of ACL inheritance, the server server does not support any form of ACL inheritance, the server
should reject the request with NFS4ERR_ATTRNOTSUPP. If the server should reject the request with NFS4ERR_ATTRNOTSUPP. If the server
supports a single "inherit ACE" flag that applies to both files and supports a single "inherit ACE" flag that applies to both files and
directories, the server may reject the request (i.e., requiring the directories, the server may reject the request (i.e., requiring the
client to set both the file and directory inheritance flags). The client to set both the file and directory inheritance flags). The
server may also accept the request and silently turn on the server may also accept the request and silently turn on the
ACE4_DIRECTORY_INHERIT_ACE flag. ACE4_DIRECTORY_INHERIT_ACE flag.
Draft Specification NFS version 4 Protocol November 2002
5.11.4. ACE who 5.11.4. ACE who
There are several special identifiers ("who") which need to be There are several special identifiers ("who") which need to be
understood universally, rather than in the context of a particular understood universally, rather than in the context of a particular
DNS domain. Some of these identifiers cannot be understood when an DNS domain. Some of these identifiers cannot be understood when an
NFS client accesses the server, but have meaning when a local process NFS client accesses the server, but have meaning when a local process
accesses the file. The ability to display and modify these accesses the file. The ability to display and modify these
permissions is permitted over NFS, even if none of the access methods permissions is permitted over NFS, even if none of the access methods
on the server understands the identifiers. on the server understands the identifiers.
skipping to change at page 57, line 5 skipping to change at page 56, line 44
const MODE4_RGRP = 0x020; /* read permission: group */ const MODE4_RGRP = 0x020; /* read permission: group */
const MODE4_WGRP = 0x010; /* write permission: group */ const MODE4_WGRP = 0x010; /* write permission: group */
const MODE4_XGRP = 0x008; /* execute permission: group */ const MODE4_XGRP = 0x008; /* execute permission: group */
const MODE4_ROTH = 0x004; /* read permission: other */ const MODE4_ROTH = 0x004; /* read permission: other */
const MODE4_WOTH = 0x002; /* write permission: other */ const MODE4_WOTH = 0x002; /* write permission: other */
const MODE4_XOTH = 0x001; /* execute permission: other */ const MODE4_XOTH = 0x001; /* execute permission: other */
Bits MODE4_RUSR, MODE4_WUSR, and MODE4_XUSR apply to the principal Bits MODE4_RUSR, MODE4_WUSR, and MODE4_XUSR apply to the principal
identified in the owner attribute. Bits MODE4_RGRP, MODE4_WGRP, and identified in the owner attribute. Bits MODE4_RGRP, MODE4_WGRP, and
Draft Specification NFS version 4 Protocol November 2002
MODE4_XGRP apply to the principals identified in the owner_group MODE4_XGRP apply to the principals identified in the owner_group
attribute. Bits MODE4_ROTH, MODE4_WOTH, MODE4_XOTH apply to any attribute. Bits MODE4_ROTH, MODE4_WOTH, MODE4_XOTH apply to any
principal that does not match that in the owner group, and does not principal that does not match that in the owner group, and does not
have a group matching that of the owner_group attribute. have a group matching that of the owner_group attribute.
The remaining bits are not defined by this protocol and MUST NOT be The remaining bits are not defined by this protocol and MUST NOT be
used. The minor version mechanism must be used to define further bit used. The minor version mechanism must be used to define further bit
usage. usage.
Note that in UNIX, if a file has the MODE4_SGID bit set and no Note that in UNIX, if a file has the MODE4_SGID bit set and no
skipping to change at page 57, line 29 skipping to change at page 57, line 18
5.11.6. Mode and ACL Attribute 5.11.6. Mode and ACL Attribute
The server that supports both mode and ACL must take care to The server that supports both mode and ACL must take care to
synchronize the MODE4_*USR, MODE4_*GRP, and MODE4_*OTH bits with the synchronize the MODE4_*USR, MODE4_*GRP, and MODE4_*OTH bits with the
ACEs which have respective who fields of "OWNER@", "GROUP@", and ACEs which have respective who fields of "OWNER@", "GROUP@", and
"EVERYONE@" so that the client can see semantically equivalent access "EVERYONE@" so that the client can see semantically equivalent access
permissions exist whether the client asks for owner, owner_group and permissions exist whether the client asks for owner, owner_group and
mode attributes, or for just the ACL. mode attributes, or for just the ACL.
Because the mode attribute includes bits (e.g. MODE4_SVTX) that have Because the mode attribute includes bits (e.g., MODE4_SVTX) that have
nothing to do with ACL semantics, it is permitted for clients to nothing to do with ACL semantics, it is permitted for clients to
specify both the ACL attribute and mode in the same SETATTR specify both the ACL attribute and mode in the same SETATTR
operation. However, because there is no prescribed order for operation. However, because there is no prescribed order for
processing the attributes in a SETATTR, the client must ensure that processing the attributes in a SETATTR, the client must ensure that
ACL attribute, if specified without mode, would produce the desired ACL attribute, if specified without mode, would produce the desired
mode bits, and conversely, the mode attribute if specified without mode bits, and conversely, the mode attribute if specified without
ACL, would produce the desired "OWNER@", "GROUP@", and "EVERYONE@" ACL, would produce the desired "OWNER@", "GROUP@", and "EVERYONE@"
ACEs. ACEs.
5.11.7. mounted_on_fileid 5.11.7. mounted_on_fileid
skipping to change at page 57, line 56 skipping to change at page 57, line 45
with a component name and a fileid. The fileid of the mount point's with a component name and a fileid. The fileid of the mount point's
directory entry will be different from the fileid that the stat() directory entry will be different from the fileid that the stat()
system call returns. The stat() system call is returning the fileid system call returns. The stat() system call is returning the fileid
of the root of the mounted filesystem, whereas readdir() is returning of the root of the mounted filesystem, whereas readdir() is returning
the fileid stat() would have returned before any filesystems were the fileid stat() would have returned before any filesystems were
mounted on the mount point. mounted on the mount point.
Unlike NFS version 3, NFS version 4 allows a client's LOOKUP request Unlike NFS version 3, NFS version 4 allows a client's LOOKUP request
to cross other filesystems. The client detects the filesystem to cross other filesystems. The client detects the filesystem
crossing whenever the filehandle argument of LOOKUP has an fsid crossing whenever the filehandle argument of LOOKUP has an fsid
attribute different from that of the filehandle returned by LOOKUP. A attribute different from that of the filehandle returned by LOOKUP.
UNIX-based client will consider this a "mount point crossing". UNIX A UNIX-based client will consider this a "mount point crossing".
UNIX has a legacy scheme for allowing a process to determine its
Draft Specification NFS version 4 Protocol November 2002 current working directory. This relies on readdir() of a mount
point's parent and stat() of the mount point returning fileids as
has a legacy scheme for allowing a process to determine its current previously described. The mounted_on_fileid attribute corresponds to
working directory. This relies on readdir() of a mount point's parent the fileid that readdir() would have returned as described
and stat() of the mount point returning fileids as previously previously.
described. The mounted_on_fileid attribute corresponds to the fileid
that readdir() would have returned as described previously.
While the NFS version 4 client could simply fabricate a fileid While the NFS version 4 client could simply fabricate a fileid
corresponding to what mounted_on_fileid provides (and if the server corresponding to what mounted_on_fileid provides (and if the server
does not support mounted_on_fileid, the client has no choice), there does not support mounted_on_fileid, the client has no choice), there
is a risk that the client will generate a fileid that conflicts with is a risk that the client will generate a fileid that conflicts with
one that is already assigned to another object in the filesystem. one that is already assigned to another object in the filesystem.
Instead, if the server can provide the mounted_on_fileid, the Instead, if the server can provide the mounted_on_fileid, the
potential for client operational problems in this area is eliminated. potential for client operational problems in this area is eliminated.
If the server detects that there is no mounted point at the target If the server detects that there is no mounted point at the target
skipping to change at page 58, line 33 skipping to change at page 58, line 25
the same as that of the fileid attribute. the same as that of the fileid attribute.
The mounted_on_fileid attribute is RECOMMENDED, so the server SHOULD The mounted_on_fileid attribute is RECOMMENDED, so the server SHOULD
provide it if possible, and for a UNIX-based server, this is provide it if possible, and for a UNIX-based server, this is
straightforward. Usually, mounted_on_fileid will be requested during straightforward. Usually, mounted_on_fileid will be requested during
a READDIR operation, in which case it is trivial (at least for UNIX- a READDIR operation, in which case it is trivial (at least for UNIX-
based servers) to return mounted_on_fileid since it is equal to the based servers) to return mounted_on_fileid since it is equal to the
fileid of a directory entry returned by readdir(). If fileid of a directory entry returned by readdir(). If
mounted_on_fileid is requested in a GETATTR operation, the server mounted_on_fileid is requested in a GETATTR operation, the server
should obey an invariant that has it returning a value that is equal should obey an invariant that has it returning a value that is equal
to the file object's entry in the object's parent directory, i.e. to the file object's entry in the object's parent directory, i.e.,
what readdir() would have returned. Some operating environments what readdir() would have returned. Some operating environments
allow a series of two or more filesystems to be mounted onto a single allow a series of two or more filesystems to be mounted onto a single
mount point. In this case, for the server to obey the aforementioned mount point. In this case, for the server to obey the aforementioned
invariant, it will need to find the base mount point, and not the invariant, it will need to find the base mount point, and not the
intermediate mount points. intermediate mount points.
Draft Specification NFS version 4 Protocol November 2002
6. Filesystem Migration and Replication 6. Filesystem Migration and Replication
With the use of the recommended attribute "fs_locations", the NFS With the use of the recommended attribute "fs_locations", the NFS
version 4 server has a method of providing filesystem migration or version 4 server has a method of providing filesystem migration or
replication services. For the purposes of migration and replication, replication services. For the purposes of migration and replication,
a filesystem will be defined as all files that share a given fsid a filesystem will be defined as all files that share a given fsid
(both major and minor values are the same). (both major and minor values are the same).
The fs_locations attribute provides a list of filesystem locations. The fs_locations attribute provides a list of filesystem locations.
These locations are specified by providing the server name (either These locations are specified by providing the server name (either
skipping to change at page 60, line 4 skipping to change at page 59, line 33
event between client and server is specified here. event between client and server is specified here.
Once the servers participating in the migration have completed the Once the servers participating in the migration have completed the
move of the filesystem, the error NFS4ERR_MOVED will be returned for move of the filesystem, the error NFS4ERR_MOVED will be returned for
subsequent requests received by the original server. The subsequent requests received by the original server. The
NFS4ERR_MOVED error is returned for all operations except PUTFH and NFS4ERR_MOVED error is returned for all operations except PUTFH and
GETATTR. Upon receiving the NFS4ERR_MOVED error, the client will GETATTR. Upon receiving the NFS4ERR_MOVED error, the client will
obtain the value of the fs_locations attribute. The client will then obtain the value of the fs_locations attribute. The client will then
use the contents of the attribute to redirect its requests to the use the contents of the attribute to redirect its requests to the
specified server. To facilitate the use of GETATTR, operations such specified server. To facilitate the use of GETATTR, operations such
Draft Specification NFS version 4 Protocol November 2002
as PUTFH must also be accepted by the server for the migrated file as PUTFH must also be accepted by the server for the migrated file
system's filehandles. Note that if the server returns NFS4ERR_MOVED, system's filehandles. Note that if the server returns NFS4ERR_MOVED,
the server MUST support the fs_locations attribute. the server MUST support the fs_locations attribute.
If the client requests more attributes than just fs_locations, the If the client requests more attributes than just fs_locations, the
server may return fs_locations only. This is to be expected since server may return fs_locations only. This is to be expected since
the server has migrated the filesystem and may not have a method of the server has migrated the filesystem and may not have a method of
obtaining additional attribute data. obtaining additional attribute data.
The server implementor needs to be careful in developing a migration The server implementor needs to be careful in developing a migration
skipping to change at page 61, line 5 skipping to change at page 60, line 39
The fs_locations struct and attribute then contains an array of The fs_locations struct and attribute then contains an array of
locations. Since the name space of each server may be constructed locations. Since the name space of each server may be constructed
differently, the "fs_root" field is provided. The path represented differently, the "fs_root" field is provided. The path represented
by fs_root represents the location of the filesystem in the server's by fs_root represents the location of the filesystem in the server's
name space. Therefore, the fs_root path is only associated with the name space. Therefore, the fs_root path is only associated with the
server from which the fs_locations attribute was obtained. The server from which the fs_locations attribute was obtained. The
fs_root path is meant to aid the client in locating the filesystem at fs_root path is meant to aid the client in locating the filesystem at
the various servers listed. the various servers listed.
Draft Specification NFS version 4 Protocol November 2002
As an example, there is a replicated filesystem located at two As an example, there is a replicated filesystem located at two
servers (servA and servB). At servA the filesystem is located at servers (servA and servB). At servA the filesystem is located at
path "/a/b/c". At servB the filesystem is located at path "/x/y/z". path "/a/b/c". At servB the filesystem is located at path "/x/y/z".
In this example the client accesses the filesystem first at servA In this example the client accesses the filesystem first at servA
with a multi-component lookup path of "/a/b/c/d". Since the client with a multi-component lookup path of "/a/b/c/d". Since the client
used a multi-component lookup to obtain the filehandle at "/a/b/c/d", used a multi-component lookup to obtain the filehandle at "/a/b/c/d",
it is unaware that the filesystem's root is located in servA's name it is unaware that the filesystem's root is located in servA's name
space at "/a/b/c". When the client switches to servB, it will need space at "/a/b/c". When the client switches to servB, it will need
to determine that the directory it first referenced at servA is now to determine that the directory it first referenced at servA is now
represented by the path "/x/y/z/d" on servB. To facilitate this, the represented by the path "/x/y/z/d" on servB. To facilitate this, the
skipping to change at page 62, line 5 skipping to change at page 61, line 35
of the fh_expire_type attribute, whether volatile filehandles will of the fh_expire_type attribute, whether volatile filehandles will
expire at the migration or replication event. If the bit expire at the migration or replication event. If the bit
FH4_VOL_MIGRATION is set in the fh_expire_type attribute, the client FH4_VOL_MIGRATION is set in the fh_expire_type attribute, the client
must treat the volatile filehandle as if the server had returned the must treat the volatile filehandle as if the server had returned the
NFS4ERR_FHEXPIRED error. At the migration or replication event in NFS4ERR_FHEXPIRED error. At the migration or replication event in
the presence of the FH4_VOL_MIGRATION bit, the client will not the presence of the FH4_VOL_MIGRATION bit, the client will not
present the original or old volatile filehandle to the new server. present the original or old volatile filehandle to the new server.
The client will start its communication with the new server by The client will start its communication with the new server by
recovering its filehandles using the saved file names. recovering its filehandles using the saved file names.
Draft Specification NFS version 4 Protocol November 2002
7. NFS Server Name Space 7. NFS Server Name Space
7.1. Server Exports 7.1. Server Exports
On a UNIX server the name space describes all the files reachable by On a UNIX server the name space describes all the files reachable by
pathnames under the root directory or "/". On a Windows NT server pathnames under the root directory or "/". On a Windows NT server
the name space constitutes all the files on disks named by mapped the name space constitutes all the files on disks named by mapped
disk letters. NFS server administrators rarely make the entire disk letters. NFS server administrators rarely make the entire
server's filesystem name space available to NFS clients. More often server's filesystem name space available to NFS clients. More often
portions of the name space are made available via an "export" portions of the name space are made available via an "export"
skipping to change at page 63, line 4 skipping to change at page 62, line 37
filesystem to another. There is a drawback to this representation of filesystem to another. There is a drawback to this representation of
the server's name space on the client: it is static. If the server the server's name space on the client: it is static. If the server
administrator adds a new export the client will be unaware of it. administrator adds a new export the client will be unaware of it.
7.3. Server Pseudo Filesystem 7.3. Server Pseudo Filesystem
NFS version 4 servers avoid this name space inconsistency by NFS version 4 servers avoid this name space inconsistency by
presenting all the exports within the framework of a single server presenting all the exports within the framework of a single server
name space. An NFS version 4 client uses LOOKUP and READDIR name space. An NFS version 4 client uses LOOKUP and READDIR
operations to browse seamlessly from one export to another. Portions operations to browse seamlessly from one export to another. Portions
Draft Specification NFS version 4 Protocol November 2002
of the server name space that are not exported are bridged via a of the server name space that are not exported are bridged via a
"pseudo filesystem" that provides a view of exported directories "pseudo filesystem" that provides a view of exported directories
only. A pseudo filesystem has a unique fsid and behaves like a only. A pseudo filesystem has a unique fsid and behaves like a
normal, read only filesystem. normal, read only filesystem.
Based on the construction of the server's name space, it is possible Based on the construction of the server's name space, it is possible
that multiple pseudo filesystems may exist. For example, that multiple pseudo filesystems may exist. For example,
/a pseudo filesystem /a pseudo filesystem
/a/b real filesystem /a/b real filesystem
skipping to change at page 63, line 45 skipping to change at page 63, line 27
representation of filesystem(s) available from the server. representation of filesystem(s) available from the server.
Therefore, the pseudo filesystem is most likely constructed Therefore, the pseudo filesystem is most likely constructed
dynamically when the server is first instantiated. It is expected dynamically when the server is first instantiated. It is expected
that the pseudo filesystem may not have an on disk counterpart from that the pseudo filesystem may not have an on disk counterpart from
which persistent filehandles could be constructed. Even though it is which persistent filehandles could be constructed. Even though it is
preferable that the server provide persistent filehandles for the preferable that the server provide persistent filehandles for the
pseudo filesystem, the NFS client should expect that pseudo file pseudo filesystem, the NFS client should expect that pseudo file
system filehandles are volatile. This can be confirmed by checking system filehandles are volatile. This can be confirmed by checking
the associated "fh_expire_type" attribute for those filehandles in the associated "fh_expire_type" attribute for those filehandles in
question. If the filehandles are volatile, the NFS client must be question. If the filehandles are volatile, the NFS client must be
prepared to recover a filehandle value (e.g. with a multi-component prepared to recover a filehandle value (e.g., with a multi-component
LOOKUP) when receiving an error of NFS4ERR_FHEXPIRED. LOOKUP) when receiving an error of NFS4ERR_FHEXPIRED.
7.6. Exported Root 7.6. Exported Root
If the server's root filesystem is exported, one might conclude that If the server's root filesystem is exported, one might conclude that
a pseudo-filesystem is not needed. This would be wrong. Assume the a pseudo-filesystem is not needed. This would be wrong. Assume the
following filesystems on a server: following filesystems on a server:
/ disk1 (exported) / disk1 (exported)
/a disk2 (not exported) /a disk2 (not exported)
Draft Specification NFS version 4 Protocol November 2002
/a/b disk3 (exported) /a/b disk3 (exported)
Because disk2 is not exported, disk3 cannot be reached with simple Because disk2 is not exported, disk3 cannot be reached with simple
LOOKUPs. The server must bridge the gap with a pseudo-filesystem. LOOKUPs. The server must bridge the gap with a pseudo-filesystem.
7.7. Mount Point Crossing 7.7. Mount Point Crossing
The server filesystem environment may be constructed in such a way The server filesystem environment may be constructed in such a way
that one filesystem contains a directory which is 'covered' or that one filesystem contains a directory which is 'covered' or
mounted upon by a second filesystem. For example: mounted upon by a second filesystem. For example:
skipping to change at page 65, line 4 skipping to change at page 64, line 40
to authenticate itself. If, based on its policies, the server to authenticate itself. If, based on its policies, the server
chooses to limit the contents of the pseudo filesystem, the server chooses to limit the contents of the pseudo filesystem, the server
may effectively hide filesystems from a client that may otherwise may effectively hide filesystems from a client that may otherwise
have legitimate access. have legitimate access.
As suggested practice, the server should apply the security policy of As suggested practice, the server should apply the security policy of
a shared resource in the server's namespace to the components of the a shared resource in the server's namespace to the components of the
resource's ancestors. For example: resource's ancestors. For example:
/ /
Draft Specification NFS version 4 Protocol November 2002
/a/b /a/b
/a/b/c /a/b/c
The /a/b/c directory is a real filesystem and is the shared resource. The /a/b/c directory is a real filesystem and is the shared resource.
The security policy for /a/b/c is Kerberos with integrity. The The security policy for /a/b/c is Kerberos with integrity. The
server should apply the same security policy to /, /a, and /a/b. server should apply the same security policy to /, /a, and /a/b.
This allows for the extension of the protection of the server's This allows for the extension of the protection of the server's
namespace to the ancestors of the real shared resource. namespace to the ancestors of the real shared resource.
For the case of the use of multiple, disjoint security mechanisms in For the case of the use of multiple, disjoint security mechanisms in
the server's resources, the security for a particular object in the the server's resources, the security for a particular object in the
server's namespace should be the union of all security mechanisms of server's namespace should be the union of all security mechanisms of
all direct descendants. all direct descendants.
Draft Specification NFS version 4 Protocol November 2002
8. File Locking and Share Reservations 8. File Locking and Share Reservations
Integrating locking into the NFS protocol necessarily causes it to be Integrating locking into the NFS protocol necessarily causes it to be
stateful. With the inclusion of share reservations the protocol stateful. With the inclusion of share reservations the protocol
becomes substantially more dependent on state than the traditional becomes substantially more dependent on state than the traditional
combination of NFS and NLM [XNFS]. There are three components to combination of NFS and NLM [XNFS]. There are three components to
making this state manageable: making this state manageable:
o Clear division between client and server o Clear division between client and server
skipping to change at page 67, line 5 skipping to change at page 66, line 13
owner. owner.
The following sections describe the transition from the heavy weight The following sections describe the transition from the heavy weight
information to the eventual stateid used for most client and server information to the eventual stateid used for most client and server
locking and lease interactions. locking and lease interactions.
8.1.1. Client ID 8.1.1. Client ID
For each LOCK request, the client must identify itself to the server. For each LOCK request, the client must identify itself to the server.
Draft Specification NFS version 4 Protocol November 2002
This is done in such a way as to allow for correct lock This is done in such a way as to allow for correct lock
identification and crash recovery. A sequence of a SETCLIENTID identification and crash recovery. A sequence of a SETCLIENTID
operation followed by a SETCLIENTID_CONFIRM operation is required to operation followed by a SETCLIENTID_CONFIRM operation is required to
establish the identification onto the server. Establishment of establish the identification onto the server. Establishment of
identification by a new incarnation of the client also has the effect identification by a new incarnation of the client also has the effect
of immediately breaking any leased state that a previous incarnation of immediately breaking any leased state that a previous incarnation
of the client might have had on the server, as opposed to forcing the of the client might have had on the server, as opposed to forcing the
new client incarnation to wait for the leases to expire. Breaking new client incarnation to wait for the leases to expire. Breaking
the lease state amounts to the server removing all lock, share the lease state amounts to the server removing all lock, share
reservation, and, where the server is not supporting the reservation, and, where the server is not supporting the
skipping to change at page 67, line 29 skipping to change at page 66, line 35
state recovery, see the section "Delegation Recovery". state recovery, see the section "Delegation Recovery".
Client identification is encapsulated in the following structure: Client identification is encapsulated in the following structure:
struct nfs_client_id4 { struct nfs_client_id4 {
verifier4 verifier; verifier4 verifier;
opaque id<NFS4_OPAQUE_LIMIT>; opaque id<NFS4_OPAQUE_LIMIT>;
}; };
The first field, verifier is a client incarnation verifier that is The first field, verifier is a client incarnation verifier that is
used to detect client reboots. Only if the verifier is different from used to detect client reboots. Only if the verifier is different
that the server has previously recorded the client (as identified by from that which the server has previously recorded the client (as
the second field of the structure, id) does the server start the identified by the second field of the structure, id) does the server
process of canceling the client's leased state. start the process of canceling the client's leased state.
The second field, id is a variable length string that uniquely The second field, id is a variable length string that uniquely
defines the client. defines the client.
There are several considerations for how the client generates the id There are several considerations for how the client generates the id
string: string:
o The string should be unique so that multiple clients do not o The string should be unique so that multiple clients do not
present the same string. The consequences of two clients present the same string. The consequences of two clients
presenting the same string range from one client getting an presenting the same string range from one client getting an error
error to one client having its leased state abruptly and to one client having its leased state abruptly and unexpectedly
unexpectedly canceled. canceled.
o The string should be selected so the subsequent incarnations o The string should be selected so the subsequent incarnations
(e.g. reboots) of the same client cause the client to present (e.g., reboots) of the same client cause the client to present the
the same string. The implementor is cautioned from an approach same string. The implementor is cautioned against an approach
that requires the string to be recorded in a local file because that requires the string to be recorded in a local file because
this precludes the use of the implementation in an environment this precludes the use of the implementation in an environment
where there is no local disk and all file access is from an NFS where there is no local disk and all file access is from an NFS
version 4 server. version 4 server.
o The string should be different for each server network address o The string should be different for each server network address
that the client accesses, rather than common to all server that the client accesses, rather than common to all server network
network addresses. The reason is that it may not be possible for addresses. The reason is that it may not be possible for the
the client to tell if same server is listening on multiple client to tell if the same server is listening on multiple network
network addresses. If the client issues SETCLIENTID with the addresses. If the client issues SETCLIENTID with the same id
string to each network address of such a server, the server will
Draft Specification NFS version 4 Protocol November 2002 think it is the same client, and each successive SETCLIENTID will
cause the server to begin the process of removing the client's
same id string to each network address of such a server, the previous leased state.
server will think it is the same client, and each successive
SETCLIENTID will cause the server to begin the process of
removing the client's previous leased state.
o The algorithm for generating the string should not assume that o The algorithm for generating the string should not assume that the
the client's network address won't change. This includes client's network address won't change. This includes changes
changes between client incarnations and even changes while the between client incarnations and even changes while the client is
client is stilling running in its current incarnation. This stilling running in its current incarnation. This means that if
means that if the client includes just the client's and server's the client includes just the client's and server's network address
network address in the id string, there is a real risk, after in the id string, there is a real risk, after the client gives up
the client gives up the network address, that another client, the network address, that another client, using a similar
using a similar algorithm for generating the id string, will algorithm for generating the id string, will generate a
generate a conflicting id string. conflicting id string.
Given the above considerations, an example of a well generated id Given the above considerations, an example of a well generated id
string is one that includes: string is one that includes:
o The server's network address. o The server's network address.
o The client's network address. o The client's network address.
o For a user level NFS version 4 client, it should contain o For a user level NFS version 4 client, it should contain
additional information to distinguish the client from other user additional information to distinguish the client from other user
skipping to change at page 69, line 4 skipping to change at page 68, line 21
- A true random number. However since this number ought to be - A true random number. However since this number ought to be
the same between client incarnations, this shares the same the same between client incarnations, this shares the same
problem as that of the using the timestamp of the software problem as that of the using the timestamp of the software
installation. installation.
As a security measure, the server MUST NOT cancel a client's leased As a security measure, the server MUST NOT cancel a client's leased
state if the principal established the state for a given id string is state if the principal established the state for a given id string is
not the same as the principal issuing the SETCLIENTID. not the same as the principal issuing the SETCLIENTID.
Note that SETCLIENTID and SETCLIENTID_CONFIRM has a secondary purpose Note that SETCLIENTID and SETCLIENTID_CONFIRM has a secondary purpose
Draft Specification NFS version 4 Protocol November 2002
of establishing the information the server needs to make callbacks to of establishing the information the server needs to make callbacks to
the client for purpose of supporting delegations. It is permitted to the client for purpose of supporting delegations. It is permitted to
change this information via SETCLIENTID and SETCLIENTID_CONFIRM change this information via SETCLIENTID and SETCLIENTID_CONFIRM
within the same incarnation of the client without removing the within the same incarnation of the client without removing the
client's leased state. client's leased state.
Once a SETCLIENTID and SETCLIENTID_CONFIRM sequence has successfully Once a SETCLIENTID and SETCLIENTID_CONFIRM sequence has successfully
completed, the client uses the short hand client identifier, of type completed, the client uses the short hand client identifier, of type
clientid4, instead of the longer and less compact nfs_client_id4 clientid4, instead of the longer and less compact nfs_client_id4
structure. This short hand client identifier (a clientid) is structure. This shorthand client identifier (a clientid) is assigned
assigned by the server and should be chosen so that it will not by the server and should be chosen so that it will not conflict with
conflict with a clientid previously assigned by the server. This a clientid previously assigned by the server. This applies across
applies across server restarts or reboots. When a clientid is server restarts or reboots. When a clientid is presented to a server
presented to a server and that clientid is not recognized, as would and that clientid is not recognized, as would happen after a server
happen after a server reboot, the server will reject the request with reboot, the server will reject the request with the error
the error NFS4ERR_STALE_CLIENTID. When this happens, the client must NFS4ERR_STALE_CLIENTID. When this happens, the client must obtain a
obtain a new clientid by use of the SETCLIENTID operation and then new clientid by use of the SETCLIENTID operation and then proceed to
proceed to any other necessary recovery for the server reboot case any other necessary recovery for the server reboot case (See the
(See the section "Server Failure and Recovery"). section "Server Failure and Recovery").
The client must also employ the SETCLIENTID operation when it The client must also employ the SETCLIENTID operation when it
receives a NFS4ERR_STALE_STATEID error using a stateid derived from receives a NFS4ERR_STALE_STATEID error using a stateid derived from
its current clientid, since this also indicates a server reboot which its current clientid, since this also indicates a server reboot which
has invalidated the existing clientid (see the next section has invalidated the existing clientid (see the next section
"lock_owner and stateid Definition" for details). "lock_owner and stateid Definition" for details).
See the detailed descriptions of SETCLIENTID and SETCLIENTID_CONFIRM See the detailed descriptions of SETCLIENTID and SETCLIENTID_CONFIRM
for a complete specification of the operations. for a complete specification of the operations.
skipping to change at page 70, line 4 skipping to change at page 69, line 27
restarted. Typically a server would not release a clientid unless restarted. Typically a server would not release a clientid unless
there had been no activity from that client for many minutes. there had been no activity from that client for many minutes.
Note that if the id string in a SETCLIENTID request is properly Note that if the id string in a SETCLIENTID request is properly
constructed, and if the client takes care to use the same principal constructed, and if the client takes care to use the same principal
for each successive use of SETCLIENTID, then, barring an active for each successive use of SETCLIENTID, then, barring an active
denial of service attack, NFS4ERR_CLID_INUSE should never be denial of service attack, NFS4ERR_CLID_INUSE should never be
returned. returned.
However, client bugs, server bugs, or perhaps a deliberate change of However, client bugs, server bugs, or perhaps a deliberate change of
Draft Specification NFS version 4 Protocol November 2002
the principal owner of the id string (such as the case of a client the principal owner of the id string (such as the case of a client
that changes security flavors, and under the new flavor, there is no that changes security flavors, and under the new flavor, there is no
mapping to the previous owner) will in rare cases result in mapping to the previous owner) will in rare cases result in
NFS4ERR_CLID_INUSE. NFS4ERR_CLID_INUSE.
In that event, when the server gets a SETCLIENTID for a client id In that event, when the server gets a SETCLIENTID for a client id
that currently has no state, or it has state, but the lease has that currently has no state, or it has state, but the lease has
expired, rather than returning NFS4ERR_CLID_INUSE, the server MUST expired, rather than returning NFS4ERR_CLID_INUSE, the server MUST
allow the SETCLIENTID, and confirm the new clientid if followed by allow the SETCLIENTID, and confirm the new clientid if followed by
the appropriate SETCLIENTID_CONFIRM. the appropriate SETCLIENTID_CONFIRM.
skipping to change at page 70, line 48 skipping to change at page 70, line 20
as long as it is able to recognize invalid and out-of-date stateids. as long as it is able to recognize invalid and out-of-date stateids.
This requirement includes those stateids generated by earlier This requirement includes those stateids generated by earlier
instances of the server. From this, the client can be properly instances of the server. From this, the client can be properly
notified of a server restart. This notification will occur when the notified of a server restart. This notification will occur when the
client presents a stateid to the server from a previous client presents a stateid to the server from a previous
instantiation. instantiation.
The server must be able to distinguish the following situations and The server must be able to distinguish the following situations and
return the error as specified: return the error as specified:
o The stateid was generated by an earlier server instance (i.e. o The stateid was generated by an earlier server instance (i.e.,
before a server reboot). The error NFS4ERR_STALE_STATEID should before a server reboot). The error NFS4ERR_STALE_STATEID should
be returned. be returned.
o The stateid was generated by the current server instance but the o The stateid was generated by the current server instance but the
stateid no longer designates the current locking state for the stateid no longer designates the current locking state for the
lockowner-file pair in question (i.e. one or more locking lockowner-file pair in question (i.e., one or more locking
operations has occurred). The error NFS4ERR_OLD_STATEID should operations has occurred). The error NFS4ERR_OLD_STATEID should be
be returned. returned.
Draft Specification NFS version 4 Protocol November 2002
This error condition will only occur when the client issues a This error condition will only occur when the client issues a
locking request which changes a stateid while an I/O request locking request which changes a stateid while an I/O request that
that uses that stateid is outstanding. uses that stateid is outstanding.
o The stateid was generated by the current server instance but the o The stateid was generated by the current server instance but the
stateid does not designate a locking state for any active stateid does not designate a locking state for any active
lockowner-file pair. The error NFS4ERR_BAD_STATEID should be lockowner-file pair. The error NFS4ERR_BAD_STATEID should be
returned. returned.
This error condition will occur when there has been a logic This error condition will occur when there has been a logic error
error on the part of the client or server. This should not on the part of the client or server. This should not happen.
happen.
One mechanism that may be used to satisfy these requirements is for One mechanism that may be used to satisfy these requirements is for
the server to, the server to,
o divide the "other" field of each stateid into two fields: o divide the "other" field of each stateid into two fields:
- A server verifier which uniquely designates a particular - A server verifier which uniquely designates a particular server
server instantiation. instantiation.
- An index into a table of locking-state structures. - An index into a table of locking-state structures.
o utilize the "seqid" field of each stateid, such that seqid is o utilize the "seqid" field of each stateid, such that seqid is
monotonically incremented for each stateid that is associated monotonically incremented for each stateid that is associated with
with the same index into the locking-state table. the same index into the locking-state table.
By matching the incoming stateid and its field values with the state By matching the incoming stateid and its field values with the state
held at the server, the server is able to easily determine if a held at the server, the server is able to easily determine if a
stateid is valid for its current instantiation and state. If the stateid is valid for its current instantiation and state. If the
stateid is not valid, the appropriate error can be supplied to the stateid is not valid, the appropriate error can be supplied to the
client. client.
8.1.4. Use of the stateid and Locking 8.1.4. Use of the stateid and Locking
All READ, WRITE and SETATTR operations contain a stateid. For the All READ, WRITE and SETATTR operations contain a stateid. For the
purposes of this section, SETATTR operations which change the size purposes of this section, SETATTR operations which change the size
attribute of a file are treated as if they are writing the area attribute of a file are treated as if they are writing the area
between the old and new size (i.e. the range truncated or added to between the old and new size (i.e., the range truncated or added to
the file by means of the SETATTR), even where SETATTR is not the file by means of the SETATTR), even where SETATTR is not
explicitly mentioned in the text. explicitly mentioned in the text.
If the lock_owner performs a READ or WRITE in a situation in which it If the lock_owner performs a READ or WRITE in a situation in which it
has established a lock or share reservation on the server (any OPEN has established a lock or share reservation on the server (any OPEN
constitutes a share reservation) the stateid (previously returned by constitutes a share reservation) the stateid (previously returned by
the server) must be used to indicate what locks, including both the server) must be used to indicate what locks, including both
record locks and share reservations, are held by the lockowner. If record locks and share reservations, are held by the lockowner. If
no state is established by the client, either record lock or share no state is established by the client, either record lock or share
Draft Specification NFS version 4 Protocol November 2002
reservation, a stateid of all bits 0 is used. Regardless whether a reservation, a stateid of all bits 0 is used. Regardless whether a
stateid of all bits 0, or a stateid returned by the server is used, stateid of all bits 0, or a stateid returned by the server is used,
if there is a conflicting share reservation or mandatory record lock if there is a conflicting share reservation or mandatory record lock
held on the file, the server MUST refuse to service the READ or WRITE held on the file, the server MUST refuse to service the READ or WRITE
operation. operation.
Share reservations are established by OPEN operations and by their Share reservations are established by OPEN operations and by their
nature are mandatory in that when the OPEN denies READ or WRITE nature are mandatory in that when the OPEN denies READ or WRITE
operations, that denial results in such operations being rejected operations, that denial results in such operations being rejected
with error NFS4ERR_LOCKED. Record locks may be implemented by the with error NFS4ERR_LOCKED. Record locks may be implemented by the
skipping to change at page 72, line 29 skipping to change at page 72, line 5
file being accessed (for example, some UNIX-based servers support a file being accessed (for example, some UNIX-based servers support a
"mandatory lock bit" on the mode attribute such that if set, record "mandatory lock bit" on the mode attribute such that if set, record
locks are required on the file before I/O is possible). When record locks are required on the file before I/O is possible). When record
locks are advisory, they only prevent the granting of conflicting locks are advisory, they only prevent the granting of conflicting
lock requests and have no effect on READs or WRITEs. Mandatory lock requests and have no effect on READs or WRITEs. Mandatory
record locks, however, prevent conflicting I/O operations. When they record locks, however, prevent conflicting I/O operations. When they
are attempted, they are rejected with NFS4ERR_LOCKED. When the are attempted, they are rejected with NFS4ERR_LOCKED. When the
client gets NFS4ERR_LOCKED on a file it knows it has the proper share client gets NFS4ERR_LOCKED on a file it knows it has the proper share
reservation for, it will need to issue a LOCK request on the region reservation for, it will need to issue a LOCK request on the region
of the file that includes the region the I/O was to be performed on, of the file that includes the region the I/O was to be performed on,
with an appropriate locktype (i.e. READ*_LT for a READ operation, with an appropriate locktype (i.e., READ*_LT for a READ operation,
WRITE*_LT for a WRITE operation). WRITE*_LT for a WRITE operation).
With NFS version 3, there was no notion of a stateid so there was no With NFS version 3, there was no notion of a stateid so there was no
way to tell if the application process of the client sending the READ way to tell if the application process of the client sending the READ
or WRITE operation had also acquired the appropriate record lock on or WRITE operation had also acquired the appropriate record lock on
the file. Thus there was no way to implement mandatory locking. With the file. Thus there was no way to implement mandatory locking.
the stateid construct, this barrier has been removed. With the stateid construct, this barrier has been removed.
Note that for UNIX environments that support mandatory file locking, Note that for UNIX environments that support mandatory file locking,
the distinction between advisory and mandatory locking is subtle. In the distinction between advisory and mandatory locking is subtle. In
fact, advisory and mandatory record locks are exactly the same in so fact, advisory and mandatory record locks are exactly the same in so
far as the APIs and requirements on implementation. If the mandatory far as the APIs and requirements on implementation. If the mandatory
lock attribute is set on the file, the server checks to see if the lock attribute is set on the file, the server checks to see if the
lockowner has an appropriate shared (read) or exclusive (write) lockowner has an appropriate shared (read) or exclusive (write)
record lock on the region it wishes to read or write to. If there is record lock on the region it wishes to read or write to. If there is
no appropriate lock, the server checks if there is a conflicting lock no appropriate lock, the server checks if there is a conflicting lock
(which can be done by attempting to acquire the conflicting lock on (which can be done by attempting to acquire the conflicting lock on
skipping to change at page 73, line 5 skipping to change at page 72, line 35
NFS4ERR_LOCKED. NFS4ERR_LOCKED.
For Windows environments, there are no advisory record locks, so the For Windows environments, there are no advisory record locks, so the
server always checks for record locks during I/O requests. server always checks for record locks during I/O requests.
Thus, the NFS version 4 LOCK operation does not need to distinguish Thus, the NFS version 4 LOCK operation does not need to distinguish
between advisory and mandatory record locks. It is the NFS version 4 between advisory and mandatory record locks. It is the NFS version 4
server's processing of the READ and WRITE operations that introduces server's processing of the READ and WRITE operations that introduces
the distinction. the distinction.
Draft Specification NFS version 4 Protocol November 2002
Every stateid other than the special stateid values noted in this Every stateid other than the special stateid values noted in this
section, whether returned by an OPEN-type operation (i.e. OPEN, section, whether returned by an OPEN-type operation (i.e., OPEN,
OPEN_DOWNGRADE), or by a LOCK-type operation (i.e. LOCK or LOCKU), OPEN_DOWNGRADE), or by a LOCK-type operation (i.e., LOCK or LOCKU),
defines an access mode for the file (i.e. READ, WRITE, or READ-WRITE) defines an access mode for the file (i.e., READ, WRITE, or READ-
as established by the original OPEN which began the stateid sequence, WRITE) as established by the original OPEN which began the stateid
and as modified by subsequent OPENs and OPEN_DOWNGRADEs within that sequence, and as modified by subsequent OPENs and OPEN_DOWNGRADEs
stateid sequence. When a READ, WRITE, or SETATTR which specifies the within that stateid sequence. When a READ, WRITE, or SETATTR which
size attribute, is done, the operation is subject to checking against specifies the size attribute, is done, the operation is subject to
the access mode to verify that the operation is appropriate given the checking against the access mode to verify that the operation is
OPEN with which the operation is associated. appropriate given the OPEN with which the operation is associated.
In the case of WRITE-type operations (i.e. WRITEs and SETATTRs which In the case of WRITE-type operations (i.e., WRITEs and SETATTRs which
set size), the server must verify that the access mode allows writing set size), the server must verify that the access mode allows writing
and return an NFS4ERR_OPENMODE error if it does not. In the case, of and return an NFS4ERR_OPENMODE error if it does not. In the case, of
READ, the server may perform the corresponding check on the access READ, the server may perform the corresponding check on the access
mode, or it may choose to allow READ on opens for WRITE only, to mode, or it may choose to allow READ on opens for WRITE only, to
accommodate clients whose write implementation may unavoidably do accommodate clients whose write implementation may unavoidably do
reads (e.g. due to buffer cache constraints). However, even if READs reads (e.g., due to buffer cache constraints). However, even if
are allowed in these circumstances, the server MUST still check for READs are allowed in these circumstances, the server MUST still check
locks that conflict with the READ (e.g. another open specify denial for locks that conflict with the READ (e.g., another open specify
of READs). Note that a server which does enforce the access mode denial of READs). Note that a server which does enforce the access
check on READs need not explicitly check for conflicting share mode check on READs need not explicitly check for conflicting share
reservations since the existence of OPEN for read access guarantees reservations since the existence of OPEN for read access guarantees
that no conflicting share reservation can exist. that no conflicting share reservation can exist.
A stateid of all bits 1 (one) MAY allow READ operations to bypass A stateid of all bits 1 (one) MAY allow READ operations to bypass
locking checks at the server. However, WRITE operations with a locking checks at the server. However, WRITE operations with a
stateid with bits all 1 (one) MUST NOT bypass locking checks and are stateid with bits all 1 (one) MUST NOT bypass locking checks and are
treated exactly the same as if a stateid of all bits 0 were used. treated exactly the same as if a stateid of all bits 0 were used.
A lock may not be granted while a READ or WRITE operation using one A lock may not be granted while a READ or WRITE operation using one
of the special stateids is being performed and the range of the lock of the special stateids is being performed and the range of the lock
skipping to change at page 74, line 4 skipping to change at page 73, line 38
Locking is different than most NFS operations as it requires "at- Locking is different than most NFS operations as it requires "at-
most-one" semantics that are not provided by ONCRPC. ONCRPC over a most-one" semantics that are not provided by ONCRPC. ONCRPC over a
reliable transport is not sufficient because a sequence of locking reliable transport is not sufficient because a sequence of locking
requests may span multiple TCP connections. In the face of requests may span multiple TCP connections. In the face of
retransmission or reordering, lock or unlock requests must have a retransmission or reordering, lock or unlock requests must have a
well defined and consistent behavior. To accomplish this, each lock well defined and consistent behavior. To accomplish this, each lock
request contains a sequence number that is a consecutively increasing request contains a sequence number that is a consecutively increasing
integer. Different lock_owners have different sequences. The server integer. Different lock_owners have different sequences. The server
maintains the last sequence number (L) received and the response that maintains the last sequence number (L) received and the response that
was returned. The first request issued for any given lock_owner is was returned. The first request issued for any given lock_owner is
Draft Specification NFS version 4 Protocol November 2002
issued with a sequence number of zero. issued with a sequence number of zero.
Note that for requests that contain a sequence number, for each Note that for requests that contain a sequence number, for each
lock_owner, there should be no more than one outstanding request. lock_owner, there should be no more than one outstanding request.
If a request (r) with a previous sequence number (r < L) is received, If a request (r) with a previous sequence number (r < L) is received,
it is rejected with the return of error NFS4ERR_BAD_SEQID. Given a it is rejected with the return of error NFS4ERR_BAD_SEQID. Given a
properly-functioning client, the response to (r) must have been properly-functioning client, the response to (r) must have been
received before the last request (L) was sent. If a duplicate of received before the last request (L) was sent. If a duplicate of
last request (r == L) is received, the stored response is returned. last request (r == L) is received, the stored response is returned.
If a request beyond the next sequence (r == L + 2) is received, it is If a request beyond the next sequence (r == L + 2) is received, it is
rejected with the return of error NFS4ERR_BAD_SEQID. Sequence rejected with the return of error NFS4ERR_BAD_SEQID. Sequence
history is reinitialized whenever the SETCLIENTID/SETCLIENTID_CONFIRM history is reinitialized whenever the SETCLIENTID/SETCLIENTID_CONFIRM
sequence changes the client verifier. sequence changes the client verifier.
Since the sequence number is represented with an unsigned 32-bit Since the sequence number is represented with an unsigned 32-bit
integer, the arithmetic involved with the sequence number is mod integer, the arithmetic involved with the sequence number is mod
2^32. For an example of modulo arithetic involving sequence numbers 2^32. For an example of modulo arithmetic involving sequence numbers
see [RFC793]. see [RFC793].
It is critical the server maintain the last response sent to the It is critical the server maintain the last response sent to the
client to provide a more reliable cache of duplicate non-idempotent client to provide a more reliable cache of duplicate non-idempotent
requests than that of the traditional cache described in [Juszczak]. requests than that of the traditional cache described in [Juszczak].
The traditional duplicate request cache uses a least recently used The traditional duplicate request cache uses a least recently used
algorithm for removing unneeded requests. However, the last lock algorithm for removing unneeded requests. However, the last lock
request and response on a given lock_owner must be cached as long as request and response on a given lock_owner must be cached as long as
the lock state exists on the server. the lock state exists on the server.
skipping to change at page 75, line 5 skipping to change at page 74, line 41
the methods described above, there are no risks of a Byzantine router the methods described above, there are no risks of a Byzantine router
re-sending old requests. The server need only maintain the re-sending old requests. The server need only maintain the
(lock_owner, sequence number) state as long as there are open files (lock_owner, sequence number) state as long as there are open files
or closed files with locks outstanding. or closed files with locks outstanding.
LOCK, LOCKU, OPEN, OPEN_DOWNGRADE, and CLOSE each contain a sequence LOCK, LOCKU, OPEN, OPEN_DOWNGRADE, and CLOSE each contain a sequence
number and therefore the risk of the replay of these operations number and therefore the risk of the replay of these operations
resulting in undesired effects is non-existent while the server resulting in undesired effects is non-existent while the server
maintains the lock_owner state. maintains the lock_owner state.
Draft Specification NFS version 4 Protocol November 2002
8.1.7. Releasing lock_owner State 8.1.7. Releasing lock_owner State
When a particular lock_owner no longer holds open or file locking When a particular lock_owner no longer holds open or file locking
state at the server, the server may choose to release the sequence state at the server, the server may choose to release the sequence
number state associated with the lock_owner. The server may make number state associated with the lock_owner. The server may make
this choice based on lease expiration, for the reclamation of server this choice based on lease expiration, for the reclamation of server
memory, or other implementation specific details. In any event, the memory, or other implementation specific details. In any event, the
server is able to do this safely only when the lock_owner no longer server is able to do this safely only when the lock_owner no longer
is being utilized by the client. The server may choose to hold the is being utilized by the client. The server may choose to hold the
lock_owner state in the event that retransmitted requests are lock_owner state in the event that retransmitted requests are
skipping to change at page 75, line 54 skipping to change at page 75, line 39
they would be prevented from acting in a timely fashion on they would be prevented from acting in a timely fashion on
information received, because that information would be provisional, information received, because that information would be provisional,
subject to deletion upon non-confirmation. Fortunately, these are subject to deletion upon non-confirmation. Fortunately, these are
situations in which the server can avoid the need for confirmation situations in which the server can avoid the need for confirmation
when responding to open requests. The two constraints are: when responding to open requests. The two constraints are:
o The server must not bestow a delegation for any open which would o The server must not bestow a delegation for any open which would
require confirmation. require confirmation.
o The server MUST NOT require confirmation on a reclaim-type open o The server MUST NOT require confirmation on a reclaim-type open
(i.e. one specifying claim type CLAIM_PREVIOUS or (i.e., one specifying claim type CLAIM_PREVIOUS or
CLAIM_DELEGATE_PREV). CLAIM_DELEGATE_PREV).
Draft Specification NFS version 4 Protocol November 2002
These constraints are related in that reclaim-type opens are the only These constraints are related in that reclaim-type opens are the only
ones in which the server may be required to send a delegation. For ones in which the server may be required to send a delegation. For
CLAIM_NULL, sending the delegation is optional while for CLAIM_NULL, sending the delegation is optional while for
CLAIM_DELEGATE_CUR, no delegation is sent. CLAIM_DELEGATE_CUR, no delegation is sent.
Delegations being sent with an open requiring confirmation are Delegations being sent with an open requiring confirmation are
troublesome because recovering from non-confirmation adds undue troublesome because recovering from non-confirmation adds undue
complexity to the protocol while requiring confirmation on reclaim- complexity to the protocol while requiring confirmation on reclaim-
type opens poses difficulties in that the inability to resolve the type opens poses difficulties in that the inability to resolve
status of the reclaim until lease expiration may make it difficult to the status of the reclaim until lease expiration may make it
have timely determination of the set of locks being reclaimed (since difficult to have timely determination of the set of locks being
the grace period may expire). reclaimed (since the grace period may expire).
Requiring open confirmation on reclaim-type opens is avoidable Requiring open confirmation on reclaim-type opens is avoidable
because of the nature of the environments in which such opens are because of the nature of the environments in which such opens are
done. For CLAIM_PREVIOUS opens, this is immediately after server done. For CLAIM_PREVIOUS opens, this is immediately after server
reboot, so there should be no time for lockowners to be created, reboot, so there should be no time for lockowners to be created,
found to be unused, and recycled. For CLAIM_DELEGATE_PREV opens, we found to be unused, and recycled. For CLAIM_DELEGATE_PREV opens, we
are dealing with a client reboot situation. A server which supports are dealing with a client reboot situation. A server which supports
delegation can be sure that no lockowners for that client have been delegation can be sure that no lockowners for that client have been
recycled since client initialization and thus can ensure that recycled since client initialization and thus can ensure that
confirmation will not be required. confirmation will not be required.
skipping to change at page 77, line 4 skipping to change at page 76, line 45
the recovery of file locking state in the event of server failure. the recovery of file locking state in the event of server failure.
As discussed in the section "Server Failure and Recovery" below, the As discussed in the section "Server Failure and Recovery" below, the
server may employ certain optimizations during recovery that work server may employ certain optimizations during recovery that work
effectively only when the client's behavior during lock recovery is effectively only when the client's behavior during lock recovery is
similar to the client's locking behavior prior to server failure. similar to the client's locking behavior prior to server failure.
8.3. Upgrading and Downgrading Locks 8.3. Upgrading and Downgrading Locks
If a client has a write lock on a record, it can request an atomic If a client has a write lock on a record, it can request an atomic
downgrade of the lock to a read lock via the LOCK request, by setting downgrade of the lock to a read lock via the LOCK request, by setting
Draft Specification NFS version 4 Protocol November 2002
the type to READ_LT. If the server supports atomic downgrade, the the type to READ_LT. If the server supports atomic downgrade, the
request will succeed. If not, it will return NFS4ERR_LOCK_NOTSUPP. request will succeed. If not, it will return NFS4ERR_LOCK_NOTSUPP.
The client should be prepared to receive this error, and if The client should be prepared to receive this error, and if
appropriate, report the error to the requesting application. appropriate, report the error to the requesting application.
If a client has a read lock on a record, it can request an atomic If a client has a read lock on a record, it can request an atomic
upgrade of the lock to a write lock via the LOCK request by setting upgrade of the lock to a write lock via the LOCK request by setting
the type to WRITE_LT or WRITEW_LT. If the server does not support the type to WRITE_LT or WRITEW_LT. If the server does not support
atomic upgrade, it will return NFS4ERR_LOCK_NOTSUPP. If the upgrade atomic upgrade, it will return NFS4ERR_LOCK_NOTSUPP. If the upgrade
can be achieved without an existing conflict, the request will can be achieved without an existing conflict, the request will
skipping to change at page 78, line 4 skipping to change at page 77, line 50
released, allowing a successful return. In this way, clients can released, allowing a successful return. In this way, clients can
avoid the burden of needlessly frequent polling for blocking locks. avoid the burden of needlessly frequent polling for blocking locks.
The server should take care in the length of delay in the event the The server should take care in the length of delay in the event the
client retransmits the request. client retransmits the request.
8.5. Lease Renewal 8.5. Lease Renewal
The purpose of a lease is to allow a server to remove stale locks The purpose of a lease is to allow a server to remove stale locks
that are held by a client that has crashed or is otherwise that are held by a client that has crashed or is otherwise
unreachable. It is not a mechanism for cache consistency and lease unreachable. It is not a mechanism for cache consistency and lease
Draft Specification NFS version 4 Protocol November 2002
renewals may not be denied if the lease interval has not expired. renewals may not be denied if the lease interval has not expired.
The following events cause implicit renewal of all of the leases for The following events cause implicit renewal of all of the leases for
a given client (i.e. all those sharing a given clientid). Each of a given client (i.e., all those sharing a given clientid). Each of
these is a positive indication that the client is still active and these is a positive indication that the client is still active and
that the associated state held at the server, for the client, is that the associated state held at the server, for the client, is
still valid. still valid.
o An OPEN with a valid clientid. o An OPEN with a valid clientid.
o Any operation made with a valid stateid (CLOSE, DELEGPURGE, o Any operation made with a valid stateid (CLOSE, DELEGPURGE,
DELEGRETURN, LOCK, LOCKU, OPEN, OPEN_CONFIRM, OPEN_DOWNGRADE, DELEGRETURN, LOCK, LOCKU, OPEN, OPEN_CONFIRM, OPEN_DOWNGRADE,
READ, RENEW, SETATTR, WRITE). This does not include the special READ, RENEW, SETATTR, WRITE). This does not include the special
stateids of all bits 0 or all bits 1. stateids of all bits 0 or all bits 1.
Note that if the client had restarted or rebooted, the Note that if the client had restarted or rebooted, the client
client would not be making these requests without issuing would not be making these requests without issuing the
the SETCLIENTID/SETCLIENTID_CONFIRM sequence. The use of SETCLIENTID/SETCLIENTID_CONFIRM sequence. The use of the
the SETCLIENTID/SETCLIENTID_CONFIRM sequence (one that SETCLIENTID/SETCLIENTID_CONFIRM sequence (one that changes the
changes the client verifier) notifies the server to drop client verifier) notifies the server to drop the locking state
the locking state associated with the client. associated with the client. SETCLIENTID/SETCLIENTID_CONFIRM never
SETCLIENTID/SETCLIENTID_CONFIRM never renews a lease. renews a lease.
If the server has rebooted, the stateids If the server has rebooted, the stateids (NFS4ERR_STALE_STATEID
(NFS4ERR_STALE_STATEID error) or the clientid error) or the clientid (NFS4ERR_STALE_CLIENTID error) will not be
(NFS4ERR_STALE_CLIENTID error) will not be valid hence valid hence preventing spurious renewals.
preventing spurious renewals.
This approach allows for low overhead lease renewal which scales This approach allows for low overhead lease renewal which scales
well. In the typical case no extra RPC calls are required for lease well. In the typical case no extra RPC calls are required for lease
renewal and in the worst case one RPC is required every lease period renewal and in the worst case one RPC is required every lease period
(i.e. a RENEW operation). The number of locks held by the client is (i.e., a RENEW operation). The number of locks held by the client is
not a factor since all state for the client is involved with the not a factor since all state for the client is involved with the
lease renewal action. lease renewal action.
Since all operations that create a new lease also renew existing Since all operations that create a new lease also renew existing
leases, the server must maintain a common lease expiration time for leases, the server must maintain a common lease expiration time for
all valid leases for a given client. This lease time can then be all valid leases for a given client. This lease time can then be
easily updated upon implicit lease renewal actions. easily updated upon implicit lease renewal actions.
8.6. Crash Recovery 8.6. Crash Recovery
The important requirement in crash recovery is that both the client The important requirement in crash recovery is that both the client
and the server know when the other has failed. Additionally, it is and the server know when the other has failed. Additionally, it is
required that a client sees a consistent view of data across server required that a client sees a consistent view of data across server
restarts or reboots. All READ and WRITE operations that may have restarts or reboots. All READ and WRITE operations that may have
been queued within the client or network buffers must wait until the been queued within the client or network buffers must wait until the
client has successfully recovered the locks protecting the READ and client has successfully recovered the locks protecting the READ and
WRITE operations. WRITE operations.
Draft Specification NFS version 4 Protocol November 2002
8.6.1. Client Failure and Recovery 8.6.1. Client Failure and Recovery
In the event that a client fails, the server may recover the client's In the event that a client fails, the server may recover the client's
locks when the associated leases have expired. Conflicting locks locks when the associated leases have expired. Conflicting locks
from another client may only be granted after this lease expiration. from another client may only be granted after this lease expiration.
If the client is able to restart or reinitialize within the lease If the client is able to restart or reinitialize within the lease
period the client may be forced to wait the remainder of the lease period the client may be forced to wait the remainder of the lease
period before obtaining new locks. period before obtaining new locks.
To minimize client delay upon restart, lock requests are associated To minimize client delay upon restart, lock requests are associated
skipping to change at page 80, line 4 skipping to change at page 80, line 9
A client can determine that server failure (and thus loss of locking A client can determine that server failure (and thus loss of locking
state) has occurred, when it receives one of two errors. The state) has occurred, when it receives one of two errors. The
NFS4ERR_STALE_STATEID error indicates a stateid invalidated by a NFS4ERR_STALE_STATEID error indicates a stateid invalidated by a
reboot or restart. The NFS4ERR_STALE_CLIENTID error indicates a reboot or restart. The NFS4ERR_STALE_CLIENTID error indicates a
clientid invalidated by reboot or restart. When either of these are clientid invalidated by reboot or restart. When either of these are
received, the client must establish a new clientid (See the section received, the client must establish a new clientid (See the section
"Client ID") and re-establish the locking state as discussed below. "Client ID") and re-establish the locking state as discussed below.
The period of special handling of locking and READs and WRITEs, equal The period of special handling of locking and READs and WRITEs, equal
Draft Specification NFS version 4 Protocol November 2002
in duration to the lease period, is referred to as the "grace in duration to the lease period, is referred to as the "grace
period". During the grace period, clients recover locks and the period". During the grace period, clients recover locks and the
associated state by reclaim-type locking requests (i.e. LOCK requests associated state by reclaim-type locking requests (i.e., LOCK
with reclaim set to true and OPEN operations with a claim type of requests with reclaim set to true and OPEN operations with a claim
CLAIM_PREVIOUS). During the grace period, the server must reject type of CLAIM_PREVIOUS). During the grace period, the server must
READ and WRITE operations and non-reclaim locking requests (i.e. reject READ and WRITE operations and non-reclaim locking requests
other LOCK and OPEN operations) with an error of NFS4ERR_GRACE. (i.e., other LOCK and OPEN operations) with an error of
NFS4ERR_GRACE.
If the server can reliably determine that granting a non-reclaim If the server can reliably determine that granting a non-reclaim
request will not conflict with reclamation of locks by other clients, request will not conflict with reclamation of locks by other clients,
the NFS4ERR_GRACE error does not have to be returned and the non- the NFS4ERR_GRACE error does not have to be returned and the non-
reclaim client request can be serviced. For the server to be able to reclaim client request can be serviced. For the server to be able to
service READ and WRITE operations during the grace period, it must service READ and WRITE operations during the grace period, it must
again be able to guarantee that no possible conflict could arise again be able to guarantee that no possible conflict could arise
between an impending reclaim locking request and the READ or WRITE between an impending reclaim locking request and the READ or WRITE
operation. If the server is unable to offer that guarantee, the operation. If the server is unable to offer that guarantee, the
NFS4ERR_GRACE error must be returned to the client. NFS4ERR_GRACE error must be returned to the client.
skipping to change at page 81, line 4 skipping to change at page 81, line 15
Clients should be prepared for the return of NFS4ERR_GRACE errors for Clients should be prepared for the return of NFS4ERR_GRACE errors for
non-reclaim lock and I/O requests. In this case the client should non-reclaim lock and I/O requests. In this case the client should
employ a retry mechanism for the request. A delay (on the order of employ a retry mechanism for the request. A delay (on the order of
several seconds) between retries should be used to avoid overwhelming several seconds) between retries should be used to avoid overwhelming
the server. Further discussion of the general issue is included in the server. Further discussion of the general issue is included in
[Floyd]. The client must account for the server that is able to [Floyd]. The client must account for the server that is able to
perform I/O and non-reclaim locking requests within the grace period perform I/O and non-reclaim locking requests within the grace period
as well as those that can not do so. as well as those that can not do so.
A reclaim-type locking request outside the server's grace period can A reclaim-type locking request outside the server's grace period can
Draft Specification NFS version 4 Protocol November 2002
only succeed if the server can guarantee that no conflicting lock or only succeed if the server can guarantee that no conflicting lock or
I/O request has been granted since reboot or restart. I/O request has been granted since reboot or restart.
A server may, upon restart, establish a new value for the lease A server may, upon restart, establish a new value for the lease
period. Therefore, clients should, once a new clientid is period. Therefore, clients should, once a new clientid is
established, refetch the lease_time attribute and use it as the basis established, refetch the lease_time attribute and use it as the basis
for lease renewal for the lease associated with that server. However, for lease renewal for the lease associated with that server.
the server must establish, for this restart event, a grace period at However, the server must establish, for this restart event, a grace
least as long as the lease period for the previous server period at least as long as the lease period for the previous server
instantiation. This allows the client state obtained during the instantiation. This allows the client state obtained during the
previous server instance to be reliably re-established. previous server instance to be reliably re-established.
8.6.3. Network Partitions and Recovery 8.6.3. Network Partitions and Recovery
If the duration of a network partition is greater than the lease If the duration of a network partition is greater than the lease
period provided by the server, the server will have not received a period provided by the server, the server will have not received a
lease renewal from the client. If this occurs, the server may free lease renewal from the client. If this occurs, the server may free
all locks held for the client. As a result, all stateids held by the all locks held for the client. As a result, all stateids held by the
client will become invalid or stale. Once the client is able to client will become invalid or stale. Once the client is able to
skipping to change at page 81, line 47 skipping to change at page 82, line 9
When a network partition is combined with a server reboot, there are When a network partition is combined with a server reboot, there are
edge conditions that place requirements on the server in order to edge conditions that place requirements on the server in order to
avoid silent data corruption following the server reboot. Two of avoid silent data corruption following the server reboot. Two of
these edge conditions are known, and are discussed below. these edge conditions are known, and are discussed below.
The first edge condition has the following scenario: The first edge condition has the following scenario:
1. Client A acquires a lock. 1. Client A acquires a lock.
2. Client A and server experience mutual network partition, 2. Client A and server experience mutual network partition, such
such that client A is unable to renew its lease. that client A is unable to renew its lease.
3. Client A's lease expires, so server releases lock. 3. Client A's lease expires, so server releases lock.
4. Client B acquires a lock that would have conflicted with 4. Client B acquires a lock that would have conflicted with that
that of Client A. of Client A.
5. Client B releases the lock 5. Client B releases the lock
Draft Specification NFS version 4 Protocol November 2002
6. Server reboots 6. Server reboots
7. Network partition between client A and server heals. 7. Network partition between client A and server heals.
8. Client A issues a RENEW operation, and gets back a 8. Client A issues a RENEW operation, and gets back a
NFS4ERR_STALE_CLIENTID. NFS4ERR_STALE_CLIENTID.
9. Client A reclaims its lock within the server's grace period. 9. Client A reclaims its lock within the server's grace period.
Thus, at the final step, the server has erroneously granted client Thus, at the final step, the server has erroneously granted client
A's lock reclaim. If client B modified the object the lock was A's lock reclaim. If client B modified the object the lock was
protecting, client A will experience object corruption. protecting, client A will experience object corruption.
The second known edge condition follows: The second known edge condition follows:
1. Client A acquires a lock. 1. Client A acquires a lock.
2. Server reboots. 2. Server reboots.
3. Client A and server experience mutual network partition, 3. Client A and server experience mutual network partition, such
such that client A is unable to reclaim its lock within the that client A is unable to reclaim its lock within the grace
grace period. period.
4. Server's reclaim grace period ends. Client A has no locks 4. Server's reclaim grace period ends. Client A has no locks
recorded on server. recorded on server.
5. Client B acquires a lock that would have conflicted with 5. Client B acquires a lock that would have conflicted with that
that of Client A. of Client A.
6. Client B releases the lock 6. Client B releases the lock.
7. Server reboots a second time 7. Server reboots a second time.
8. Network partition between client A and server heals. 8. Network partition between client A and server heals.
9. Client A issues a RENEW operation, and gets back a 9. Client A issues a RENEW operation, and gets back a
NFS4ERR_STALE_CLIENTID. NFS4ERR_STALE_CLIENTID.
10. Client A reclaims its lock within the server's grace period. 10. Client A reclaims its lock within the server's grace period.
As with the first edge condition, the final step of the scenario of As with the first edge condition, the final step of the scenario of
the second edge condition has the server erroneously granting client the second edge condition has the server erroneously granting client
A's lock reclaim. A's lock reclaim.
Solving the first and second edge conditions requires that the server Solving the first and second edge conditions requires that the server
either assume after it reboots that edge condition occurs, and thus either assume after it reboots that edge condition occurs, and thus
return NFS4ERR_NO_GRACE for all reclaim attempts, or that the server return NFS4ERR_NO_GRACE for all reclaim attempts, or that the server
record some information stable storage. The amount of information record some information stable storage. The amount of information
the server records in stable storage is in inverse proportion to how the server records in stable storage is in inverse proportion to how
harsh the server wants to be whenever the edge conditions occur. The harsh the server wants to be whenever the edge conditions occur. The
server that is completely tolerant of all edge conditions will record server that is completely tolerant of all edge conditions will record
in stable storage every lock that is acquired, removing the lock in stable storage every lock that is acquired, removing the lock
Draft Specification NFS version 4 Protocol November 2002
record from stable storage only when the lock is unlocked by the record from stable storage only when the lock is unlocked by the
client and the lock's lockowner advances the sequence number such client and the lock's lockowner advances the sequence number such
that the lock release is not the last stateful event for the that the lock release is not the last stateful event for the
lockowner's sequence. For the two aforementioned edge conditions, the lockowner's sequence. For the two aforementioned edge conditions,
harshest a server can be, and still support a grace period for the harshest a server can be, and still support a grace period for
reclaims, requires that the server record in stable storage reclaims, requires that the server record in stable storage
information some minimal information. For example, a server information some minimal information. For example, a server
implementation could, for each client, save in stable storage a implementation could, for each client, save in stable storage a
record containing: record containing:
o the client's id string o the client's id string
o a boolean that indicates if the client's lease expired or if o a boolean that indicates if the client's lease expired or if there
there was administrative intervention (see the section, was administrative intervention (see the section, Server
Server Revocation of Locks) to revoke a record lock, share Revocation of Locks) to revoke a record lock, share reservation,
reservation, or delegation or delegation
o a timestamp that is updated the first time after a server o a timestamp that is updated the first time after a server boot or
boot or reboot the client acquires record locking, share reboot the client acquires record locking, share reservation, or
reservation, or delegation state on the server. The delegation state on the server. The timestamp need not be updated
timestamp need not be updated on subsequent lock requests on subsequent lock requests until the server reboots.
until the server reboots.
The server implementation would also record in the stable storage the The server implementation would also record in the stable storage the
timestamps from the two most recent server reboots. timestamps from the two most recent server reboots.
Assuming the above record keeping, for the first edge condition, Assuming the above record keeping, for the first edge condition,
after the server reboots, the record that client A's lease expired after the server reboots, the record that client A's lease expired
means that another client could have acquired a conflicting record means that another client could have acquired a conflicting record
lock, share reservation, or delegation. Hence the server must reject lock, share reservation, or delegation. Hence the server must reject
a reclaim from client A with the error NFS4ERR_NO_GRACE. a reclaim from client A with the error NFS4ERR_NO_GRACE.
For the second edge condition, after the server reboots for a second For the second edge condition, after the server reboots for a second
time, the record that the client had an unexpired record lock, share time, the record that the client had an unexpired record lock, share
reservation, or delegation established before the server's previous reservation, or delegation established before the server's previous
incarnation means that the server must reject a reclaim from client A incarnation means that the server must reject a reclaim from client A
with the error NFS4ERR_NO_GRACE. with the error NFS4ERR_NO_GRACE.
Regardless of the level and approach to record keeping, the server Regardless of the level and approach to record keeping, the server
MUST implement one of the following strategies (which apply to MUST implement one of the following strategies (which apply to
reclaims of share reservations, record locks, and delegations): reclaims of share reservations, record locks, and delegations):
1. Reject all reclaims with NFS4ERR_NO_GRACE. This is 1. Reject all reclaims with NFS4ERR_NO_GRACE. This is superharsh,
superharsh, but necessary if the server does not want to but necessary if the server does not want to record lock state
record lock state in stable storage. in stable storage.
2. Record sufficient state in stable storage such that all 2. Record sufficient state in stable storage such that all known
known edge conditions involving server reboot, including the edge conditions involving server reboot, including the two
two noted in this section, are detected. False positives are noted in this section, are detected. False positives are
acceptable. Note that at this time, it is not known if there acceptable. Note that at this time, it is not known if there
are other edge conditions. are other edge conditions.
Draft Specification NFS version 4 Protocol November 2002 In the event, after a server reboot, the server determines that
there is unrecoverable damage or corruption to the the stable
In the event, after a server reboot, the server determines storage, then for all clients and/or locks affected, the server
that there is unrecoverable damage or corruption to the the MUST return NFS4ERR_NO_GRACE.
stable storage, then for all clients and/or locks affected,
the server MUST return NFS4ERR_NO_GRACE.
A mandate for the client's handling of the NFS4ERR_NO_GRACE error is A mandate for the client's handling of the NFS4ERR_NO_GRACE error is
outside the scope of this specification, since the strategies for outside the scope of this specification, since the strategies for
such handling are very dependent on the client's operating such handling are very dependent on the client's operating
environment. However, one potential approach is described below. environment. However, one potential approach is described below.
When the client receives NFS4ERR_NO_GRACE, it could examine the When the client receives NFS4ERR_NO_GRACE, it could examine the
change attribute of the objects the client is trying to reclaim state change attribute of the objects the client is trying to reclaim state
for, and use that to determine whether to re-establish the state via for, and use that to determine whether to re-establish the state via
normal OPEN or LOCK requests. This is acceptable provided the normal OPEN or LOCK requests. This is acceptable provided the
skipping to change at page 84, line 37 skipping to change at page 85, line 9
client should do for dealing with unreclaimed delegations on client client should do for dealing with unreclaimed delegations on client
state. state.
For further discussion of revocation of locks see the section "Server For further discussion of revocation of locks see the section "Server
Revocation of Locks". Revocation of Locks".
8.7. Recovery from a Lock Request Timeout or Abort 8.7. Recovery from a Lock Request Timeout or Abort
In the event a lock request times out, a client may decide to not In the event a lock request times out, a client may decide to not
retry the request. The client may also abort the request when the retry the request. The client may also abort the request when the
process for which it was issued is terminated (e.g. in UNIX due to a process for which it was issued is terminated (e.g., in UNIX due to a
signal). It is possible though that the server received the request signal). It is possible though that the server received the request
and acted upon it. This would change the state on the server without and acted upon it. This would change the state on the server without
the client being aware of the change. It is paramount that the the client being aware of the change. It is paramount that the
client re-synchronize state with server before it attempts any other client re-synchronize state with server before it attempts any other
operation that takes a seqid and/or a stateid with the same operation that takes a seqid and/or a stateid with the same
lock_owner. This is straightforward to do without a special re- lock_owner. This is straightforward to do without a special re-
synchronize operation. synchronize operation.
Since the server maintains the last lock request and response Since the server maintains the last lock request and response
received on the lock_owner, for each lock_owner, the client should received on the lock_owner, for each lock_owner, the client should
cache the last lock request it sent such that the lock request did cache the last lock request it sent such that the lock request did
not receive a response. From this, the next time the client does a not receive a response. From this, the next time the client does a
lock operation for the lock_owner, it can send the cached request, if lock operation for the lock_owner, it can send the cached request, if
there is one, and if the request was one that established state (e.g. there is one, and if the request was one that established state
a LOCK or OPEN operation), the server will return the cached result (e.g., a LOCK or OPEN operation), the server will return the cached
or if never saw the request, perform it. The client can follow up result or if never saw the request, perform it. The client can
with a request to remove the state (e.g. a LOCKU or CLOSE operation). follow up with a request to remove the state (e.g., a LOCKU or CLOSE
With this approach, the sequencing and stateid information on the operation). With this approach, the sequencing and stateid
client and server for the given lock_owner will re-synchronize and in information on the client and server for the given lock_owner will
turn the lock state will re-synchronize. re-synchronize and in turn the lock state will re-synchronize.
Draft Specification NFS version 4 Protocol November 2002
8.8. Server Revocation of Locks 8.8. Server Revocation of Locks
At any point, the server can revoke locks held by a client and the At any point, the server can revoke locks held by a client and the
client must be prepared for this event. When the client detects that client must be prepared for this event. When the client detects that
its locks have been or may have been revoked, the client is its locks have been or may have been revoked, the client is
responsible for validating the state information between itself and responsible for validating the state information between itself and
the server. Validating locking state for the client means that it the server. Validating locking state for the client means that it
must verify or reclaim state for each lock currently held. must verify or reclaim state for each lock currently held.
skipping to change at page 86, line 5 skipping to change at page 86, line 35
which the server may grant conflicting locks after the lease period which the server may grant conflicting locks after the lease period
has expired for a client. When it is possible that the lease period has expired for a client. When it is possible that the lease period
has expired, the client must validate each lock currently held to has expired, the client must validate each lock currently held to
ensure that a conflicting lock has not been granted. The client may ensure that a conflicting lock has not been granted. The client may
accomplish this task by issuing an I/O request, either a pending I/O accomplish this task by issuing an I/O request, either a pending I/O
or a zero-length read, specifying the stateid associated with the or a zero-length read, specifying the stateid associated with the
lock in question. If the response to the request is success, the lock in question. If the response to the request is success, the
client has validated all of the locks governed by that stateid and client has validated all of the locks governed by that stateid and
re-established the appropriate state between itself and the server. re-established the appropriate state between itself and the server.
Draft Specification NFS version 4 Protocol November 2002
If the I/O request is not successful, then one or more of the locks If the I/O request is not successful, then one or more of the locks
associated with the stateid was revoked by the server and the client associated with the stateid was revoked by the server and the client
must notify the owner. must notify the owner.
8.9. Share Reservations 8.9. Share Reservations
A share reservation is a mechanism to control access to a file. It A share reservation is a mechanism to control access to a file. It
is a separate and independent mechanism from record locking. When a is a separate and independent mechanism from record locking. When a
client opens a file, it issues an OPEN operation to the server client opens a file, it issues an OPEN operation to the server
specifying the type of access required (READ, WRITE, or BOTH) and the specifying the type of access required (READ, WRITE, or BOTH) and the
skipping to change at page 87, line 5 skipping to change at page 87, line 35
To provide correct share semantics, a client MUST use the OPEN To provide correct share semantics, a client MUST use the OPEN
operation to obtain the initial filehandle and indicate the desired operation to obtain the initial filehandle and indicate the desired
access and what if any access to deny. Even if the client intends to access and what if any access to deny. Even if the client intends to
use a stateid of all 0's or all 1's, it must still obtain the use a stateid of all 0's or all 1's, it must still obtain the
filehandle for the regular file with the OPEN operation so the filehandle for the regular file with the OPEN operation so the
appropriate share semantics can be applied. For clients that do not appropriate share semantics can be applied. For clients that do not
have a deny mode built into their open programming interfaces, deny have a deny mode built into their open programming interfaces, deny
equal to NONE should be used. equal to NONE should be used.
Draft Specification NFS version 4 Protocol November 2002
The OPEN operation with the CREATE flag, also subsumes the CREATE The OPEN operation with the CREATE flag, also subsumes the CREATE
operation for regular files as used in previous versions of the NFS operation for regular files as used in previous versions of the NFS
protocol. This allows a create with a share to be done atomically. protocol. This allows a create with a share to be done atomically.
The CLOSE operation removes all share reservations held by the The CLOSE operation removes all share reservations held by the
lock_owner on that file. If record locks are held, the client SHOULD lock_owner on that file. If record locks are held, the client SHOULD
release all locks before issuing a CLOSE. The server MAY free all release all locks before issuing a CLOSE. The server MAY free all
outstanding locks on CLOSE but some servers may not support the CLOSE outstanding locks on CLOSE but some servers may not support the CLOSE
of a file that still has record locks held. The server MUST return of a file that still has record locks held. The server MUST return
failure, NFS4ERR_LOCKS_HELD, if any locks would exist after the failure, NFS4ERR_LOCKS_HELD, if any locks would exist after the
CLOSE. CLOSE.
The LOOKUP operation will return a filehandle without establishing The LOOKUP operation will return a filehandle without establishing
any lock state on the server. Without a valid stateid, the server any lock state on the server. Without a valid stateid, the server
will assume the client has the least access. For example, a file will assume the client has the least access. For example, a file
opened with deny READ/WRITE cannot be accessed using a filehandle opened with deny READ/WRITE cannot be accessed using a filehandle
obtained through LOOKUP because it would not have a valid stateid obtained through LOOKUP because it would not have a valid stateid
(i.e. using a stateid of all bits 0 or all bits 1). (i.e., using a stateid of all bits 0 or all bits 1).
8.10.1. Close and Retention of State Information 8.10.1. Close and Retention of State Information
Since a CLOSE operation requests deallocation of a stateid, dealing Since a CLOSE operation requests deallocation of a stateid, dealing
with retransmission of the CLOSE, may pose special difficulties, with retransmission of the CLOSE, may pose special difficulties,
since the state information, which normally would be used to since the state information, which normally would be used to
determine the state of the open file being designated, might be determine the state of the open file being designated, might be
deallocated, resulting in an NFS4ERR_BAD_STATEID error. deallocated, resulting in an NFS4ERR_BAD_STATEID error.
Servers may deal with this problem in a number of ways. To provide Servers may deal with this problem in a number of ways. To provide
the greatest degree assurance that the protocol is being used the greatest degree assurance that the protocol is being used
properly, a server should, rather than deallocate the stateid, mark properly, a server should, rather than deallocate the stateid, mark
it as close-pending, and retain the stateid with this status, until it as close-pending, and retain the stateid with this status, until
later deallocation. In this way, a retransmitted CLOSE can be later deallocation. In this way, a retransmitted CLOSE can be
recognized since the stateid points to state information with this recognized since the stateid points to state information with this
distinctive status, so that it can be handled without error. distinctive status, so that it can be handled without error.
When adopting this strategy, a server should retain the state When adopting this strategy, a server should retain the state
information until the earliest of: information until the earliest of:
o Another validly sequenced request for the same lockowner, that o Another validly sequenced request for the same lockowner, that is
is not a retransmission. not a retransmission.
o The time that a lockowner is freed by the server due to period o The time that a lockowner is freed by the server due to period
with no activity. with no activity.
o All locks for the client are freed as a result of a SETCLIENTID. o All locks for the client are freed as a result of a SETCLIENTID.
Servers may avoid this complexity, at the cost of less complete Servers may avoid this complexity, at the cost of less complete
protocol error checking, by simply responding NFS4_OK in the event of protocol error checking, by simply responding NFS4_OK in the event of
a CLOSE for a deallocated stateid, on the assumption that this case a CLOSE for a deallocated stateid, on the assumption that this case
must be caused by a retransmitted close. When adopting this must be caused by a retransmitted close. When adopting this
Draft Specification NFS version 4 Protocol November 2002
approach, it is desirable to at least log an error when returning a approach, it is desirable to at least log an error when returning a
no-error indication in this situation. If the server maintains a no-error indication in this situation. If the server maintains a
reply-cache mechanism, it can verify the CLOSE is indeed a reply-cache mechanism, it can verify the CLOSE is indeed a
retransmission and avoid error logging in most cases. retransmission and avoid error logging in most cases.
8.11. Open Upgrade and Downgrade 8.11. Open Upgrade and Downgrade
When an OPEN is done for a file and the lockowner for which the open When an OPEN is done for a file and the lockowner for which the open
is being done already has the file open, the result is to upgrade the is being done already has the file open, the result is to upgrade the
open file status maintained on the server to include the access and open file status maintained on the server to include the access and
skipping to change at page 88, line 37 skipping to change at page 89, line 21
to the same file object and returns different filehandles on two to the same file object and returns different filehandles on two
different OPENs of the same file object, the server MUST NOT "OR" different OPENs of the same file object, the server MUST NOT "OR"
together the access and deny bits and coalesce the two open files. together the access and deny bits and coalesce the two open files.
Instead the server must maintain separate OPENs with separate Instead the server must maintain separate OPENs with separate
stateids and will require separate CLOSEs to free them. stateids and will require separate CLOSEs to free them.
When multiple open files on the client are merged into a single open When multiple open files on the client are merged into a single open
file object on the server, the close of one of the open files (on the file object on the server, the close of one of the open files (on the
client) may necessitate change of the access and deny status of the client) may necessitate change of the access and deny status of the
open file on the server. This is because the union of the access and open file on the server. This is because the union of the access and
deny bits for the remaining opens may be smaller (i.e. a proper deny bits for the remaining opens may be smaller (i.e., a proper
subset) than previously. The OPEN_DOWNGRADE operation is used to subset) than previously. The OPEN_DOWNGRADE operation is used to
make the necessary change and the client should use it to update the make the necessary change and the client should use it to update the
server so that share reservation requests by other clients are server so that share reservation requests by other clients are
handled properly. handled properly.
8.12. Short and Long Leases 8.12. Short and Long Leases
When determining the time period for the server lease, the usual When determining the time period for the server lease, the usual
lease tradeoffs apply. Short leases are good for fast server lease tradeoffs apply. Short leases are good for fast server
recovery at a cost of increased RENEW or READ (with zero length) recovery at a cost of increased RENEW or READ (with zero length)
requests. Longer leases are certainly kinder and gentler to servers requests. Longer leases are certainly kinder and gentler to servers
trying to handle very large numbers of clients. The number of RENEW trying to handle very large numbers of clients. The number of RENEW
requests drop in proportion to the lease time. The disadvantages of requests drop in proportion to the lease time. The disadvantages of
long leases are slower recovery after server failure (the server must long leases are slower recovery after server failure (the server must
wait for the leases to expire and the grace period to elapse before wait for the leases to expire and the grace period to elapse before
granting new lock requests) and increased file contention (if client granting new lock requests) and increased file contention (if client
fails to transmit an unlock request then server must wait for lease fails to transmit an unlock request then server must wait for lease
expiration before granting new locks). expiration before granting new locks).
Draft Specification NFS version 4 Protocol November 2002
Long leases are usable if the server is able to store lease state in Long leases are usable if the server is able to store lease state in
non-volatile memory. Upon recovery, the server can reconstruct the non-volatile memory. Upon recovery, the server can reconstruct the
lease state from its non-volatile memory and continue operation with lease state from its non-volatile memory and continue operation with
its clients and therefore long leases would not be an issue. its clients and therefore long leases would not be an issue.
8.13. Clocks, Propagation Delay, and Calculating Lease Expiration 8.13. Clocks, Propagation Delay, and Calculating Lease Expiration
To avoid the need for synchronized clocks, lease times are granted by To avoid the need for synchronized clocks, lease times are granted by
the server as a time delta. However, there is a requirement that the the server as a time delta. However, there is a requirement that the
client and server clocks do not drift excessively over the duration client and server clocks do not drift excessively over the duration
of the lock. There is also the issue of propagation delay across the of the lock. There is also the issue of propagation delay across the
network which could easily be several hundred milliseconds as well as network which could easily be several hundred milliseconds as well as
the possibility that requests will be lost and need to be the possibility that requests will be lost and need to be
retransmitted. retransmitted.
To take propagation delay into account, the client should subtract it To take propagation delay into account, the client should subtract it
from lease times (e.g. if the client estimates the one-way from lease times (e.g., if the client estimates the one-way
propagation delay as 200 msec, then it can assume that the lease is propagation delay as 200 msec, then it can assume that the lease is
already 200 msec old when it gets it). In addition, it will take already 200 msec old when it gets it). In addition, it will take
another 200 msec to get a response back to the server. So the client another 200 msec to get a response back to the server. So the client
must send a lock renewal or write data back to the server 400 msec must send a lock renewal or write data back to the server 400 msec
before the lease would expire. before the lease would expire.
The server's lease period configuration should take into account the The server's lease period configuration should take into account the
network distance of the clients that will be accessing the server's network distance of the clients that will be accessing the server's
resources. It is expected that the lease period will take into resources. It is expected that the lease period will take into
account the network propagation delays and other network delay account the network propagation delays and other network delay
factors for the client population. Since the protocol does not allow factors for the client population. Since the protocol does not allow
for an automatic method to determine an appropriate lease period, the for an automatic method to determine an appropriate lease period, the
server's administrator may have to tune the lease period. server's administrator may have to tune the lease period.
8.14. Migration, Replication and State 8.14. Migration, Replication and State
When responsibility for handling a given file system is transferred When responsibility for handling a given file system is transferred
to a new server (migration) or the client chooses to use an alternate to a new server (migration) or the client chooses to use an alternate
server (e.g. in response to server unresponsiveness) in the context server (e.g., in response to server unresponsiveness) in the context
of file system replication, the appropriate handling of state shared of file system replication, the appropriate handling of state shared
between the client and server (i.e. locks, leases, stateids, and between the client and server (i.e., locks, leases, stateids, and
clientids) is as described below. The handling differs between clientids) is as described below. The handling differs between
migration and replication. For related discussion of file server migration and replication. For related discussion of file server
state and recover of such see the sections under "File Locking and state and recover of such see the sections under "File Locking and
Share Reservations" Share Reservations".
If server replica or a server immigrating a filesystem agrees to, or If server replica or a server immigrating a filesystem agrees to, or
is expected to, accept opaque values from the client that originated is expected to, accept opaque values from the client that originated
from another server, then it is a wise implementation practice for from another server, then it is a wise implementation practice for
the servers to encode the "opaque" values in network byte order. This the servers to encode the "opaque" values in network byte order.
way, servers acting as replicas or immigrating filesystems will be This way, servers acting as replicas or immigrating filesystems will
able to parse values like stateids, directory cookies, filehandles, be able to parse values like stateids, directory cookies,
etc. even if their native byte order is different from other servers filehandles, etc. even if their native byte order is different from
other servers cooperating in the replication and migration of the
Draft Specification NFS version 4 Protocol November 2002 filesystem.
cooperating in the replication and migration of the filesystem.
8.14.1. Migration and State 8.14.1. Migration and State
In the case of migration, the servers involved in the migration of a In the case of migration, the servers involved in the migration of a
filesystem SHOULD transfer all server state from the original to the filesystem SHOULD transfer all server state from the original to the
new server. This must be done in a way that is transparent to the new server. This must be done in a way that is transparent to the
client. This state transfer will ease the client's transition when a client. This state transfer will ease the client's transition when a
filesystem migration occurs. If the servers are successful in filesystem migration occurs. If the servers are successful in
transferring all state, the client will continue to use stateids transferring all state, the client will continue to use stateids
assigned by the original server. Therefore the new server must assigned by the original server. Therefore the new server must
skipping to change at page 91, line 4 skipping to change at page 91, line 46
server control, the handling of state is different. In this case, server control, the handling of state is different. In this case,
leases, stateids and clientids do not have validity across a leases, stateids and clientids do not have validity across a
transition from one server to another. The client must re-establish transition from one server to another. The client must re-establish
its locks on the new server. This can be compared to the re- its locks on the new server. This can be compared to the re-
establishment of locks by means of reclaim-type requests after a establishment of locks by means of reclaim-type requests after a
server reboot. The difference is that the server has no provision to server reboot. The difference is that the server has no provision to
distinguish requests reclaiming locks from those obtaining new locks distinguish requests reclaiming locks from those obtaining new locks
or to defer the latter. Thus, a client re-establishing a lock on the or to defer the latter. Thus, a client re-establishing a lock on the
new server (by means of a LOCK or OPEN request), may have the new server (by means of a LOCK or OPEN request), may have the
requests denied due to a conflicting lock. Since replication is requests denied due to a conflicting lock. Since replication is
Draft Specification NFS version 4 Protocol November 2002
intended for read-only use of filesystems, such denial of locks intended for read-only use of filesystems, such denial of locks
should not pose large difficulties in practice. When an attempt to should not pose large difficulties in practice. When an attempt to
re-establish a lock on a new server is denied, the client should re-establish a lock on a new server is denied, the client should
treat the situation as if his original lock had been revoked. treat the situation as if his original lock had been revoked.
8.14.3. Notification of Migrated Lease 8.14.3. Notification of Migrated Lease
In the case of lease renewal, the client may not be submitting In the case of lease renewal, the client may not be submitting
requests for a filesystem that has been migrated to another server. requests for a filesystem that has been migrated to another server.
This can occur because of the implicit lease renewal mechanism. The This can occur because of the implicit lease renewal mechanism. The
client renews leases for all filesystems when submitting a request to client renews leases for all filesystems when submitting a request to
any one filesystem at the server. any one filesystem at the server.
In order for the client to schedule renewal of leases that may have In order for the client to schedule renewal of leases that may have
been relocated to the new server, the client must find out about been relocated to the new server, the client must find out about
lease relocation before those leases expire. To accomplish this, all lease relocation before those leases expire. To accomplish this, all
operations which implicitly renew leases for a client (i.e. OPEN, operations which implicitly renew leases for a client (i.e., OPEN,
CLOSE, READ, WRITE, RENEW, LOCK, LOCKT, LOCKU), will return the error CLOSE, READ, WRITE, RENEW, LOCK, LOCKT, LOCKU), will return the error
NFS4ERR_LEASE_MOVED if responsibility for any of the leases to be NFS4ERR_LEASE_MOVED if responsibility for any of the leases to be
renewed has been transferred to a new server. This condition will renewed has been transferred to a new server. This condition will
continue until the client receives an NFS4ERR_MOVED error and the continue until the client receives an NFS4ERR_MOVED error and the
server receives the subsequent GETATTR(fs_locations) for an access to server receives the subsequent GETATTR(fs_locations) for an access to
each filesystem for which a lease has been moved to a new server. each filesystem for which a lease has been moved to a new server.
When a client receives an NFS4ERR_LEASE_MOVED error, it should When a client receives an NFS4ERR_LEASE_MOVED error, it should
perform an operation on each filesystem associated with the server in perform an operation on each filesystem associated with the server in
question. When the client receives an NFS4ERR_MOVED error, the question. When the client receives an NFS4ERR_MOVED error, the
skipping to change at page 92, line 5 skipping to change at page 92, line 50
When state is transferred transparently, that state should include When state is transferred transparently, that state should include
the correct value of the lease_time attribute. The lease_time the correct value of the lease_time attribute. The lease_time
attribute on the destination server must never be less than that on attribute on the destination server must never be less than that on
the source since this would result in premature expiration of leases the source since this would result in premature expiration of leases
granted by the source server. Upon migration in which state is granted by the source server. Upon migration in which state is
transferred transparently, the client is under no obligation to re- transferred transparently, the client is under no obligation to re-
fetch the lease_time attribute and may continue to use the value fetch the lease_time attribute and may continue to use the value
previously fetched (on the source server). previously fetched (on the source server).
Draft Specification NFS version 4 Protocol November 2002 If state has not been transferred transparently (i.e., the client
sees a real or simulated server reboot), the client should fetch the
If state has not been transferred transparently (i.e. the client sees value of lease_time on the new (i.e., destination) server, and use it
a real or simulated server reboot), the client should fetch the value for subsequent locking requests. However the server must respect a
of lease_time on the new (i.e. destination) server, and use it for grace period at least as long as the lease_time on the source server,
subsequent locking requests. However the server must respect a grace in order to ensure that clients have ample time to reclaim their
period at least as long as the lease_time on the source server, in locks before potentially conflicting non-reclaimed locks are granted.
order to ensure that clients have ample time to reclaim their locks The means by which the new server obtains the value of lease_time on
before potentially conflicting non-reclaimed locks are granted. The the old server is left to the server implementations. It is not
means by which the new server obtains the value of lease_time on the
old server is left to the server implementations. It is not
specified by the NFS version 4 protocol. specified by the NFS version 4 protocol.
Draft Specification NFS version 4 Protocol November 2002
9. Client-Side Caching 9. Client-Side Caching
Client-side caching of data, of file attributes, and of file names is Client-side caching of data, of file attributes, and of file names is
essential to providing good performance with the NFS protocol. essential to providing good performance with the NFS protocol.
Providing distributed cache coherence is a difficult problem and Providing distributed cache coherence is a difficult problem and
previous versions of the NFS protocol have not attempted it. previous versions of the NFS protocol have not attempted it.
Instead, several NFS client implementation techniques have been used Instead, several NFS client implementation techniques have been used
to reduce the problems that a lack of coherence poses for users. to reduce the problems that a lack of coherence poses for users.
These techniques have not been clearly defined by earlier protocol These techniques have not been clearly defined by earlier protocol
specifications and it is often unclear what is valid or invalid specifications and it is often unclear what is valid or invalid
skipping to change at page 94, line 4 skipping to change at page 94, line 16
conflicts exist is expensive. A better option with regards to conflicts exist is expensive. A better option with regards to
performance is to allow a client that repeatedly opens a file to do performance is to allow a client that repeatedly opens a file to do
so without reference to the server. This is done until potentially so without reference to the server. This is done until potentially
conflicting operations from another client actually occur. conflicting operations from another client actually occur.
A similar situation arises in connection with file locking. Sending A similar situation arises in connection with file locking. Sending
file lock and unlock requests to the server as well as the read and file lock and unlock requests to the server as well as the read and
write requests necessary to make data caching consistent with the write requests necessary to make data caching consistent with the
locking semantics (see the section "Data Caching and File Locking") locking semantics (see the section "Data Caching and File Locking")
can severely limit performance. When locking is used to provide can severely limit performance. When locking is used to provide
Draft Specification NFS version 4 Protocol November 2002
protection against infrequent conflicts, a large penalty is incurred. protection against infrequent conflicts, a large penalty is incurred.
This penalty may discourage the use of file locking by applications. This penalty may discourage the use of file locking by applications.
The NFS version 4 protocol provides more aggressive caching The NFS version 4 protocol provides more aggressive caching
strategies with the following design goals: strategies with the following design goals:
o Compatibility with a large range of server semantics. o Compatibility with a large range of server semantics.
o Provide the same caching benefits as previous versions of the o Provide the same caching benefits as previous versions of the NFS
NFS protocol when unable to provide the more aggressive model. protocol when unable to provide the more aggressive model.
o Requirements for aggressive caching are organized so that a o Requirements for aggressive caching are organized so that a large
large portion of the benefit can be obtained even when not all portion of the benefit can be obtained even when not all of the
of the requirements can be met. requirements can be met.
The appropriate requirements for the server are discussed in later The appropriate requirements for the server are discussed in later
sections in which specific forms of caching are covered. (see the sections in which specific forms of caching are covered. (see the
section "Open Delegation"). section "Open Delegation").
9.2. Delegation and Callbacks 9.2. Delegation and Callbacks
Recallable delegation of server responsibilities for a file to a Recallable delegation of server responsibilities for a file to a
client improves performance by avoiding repeated requests to the client improves performance by avoiding repeated requests to the
server in the absence of inter-client conflict. With the use of a server in the absence of inter-client conflict. With the use of a
skipping to change at page 95, line 4 skipping to change at page 95, line 19
firewalls, for example), correct protocol operation does not depend firewalls, for example), correct protocol operation does not depend
on them. Preliminary testing of callback functionality by means of a on them. Preliminary testing of callback functionality by means of a
CB_NULL procedure determines whether callbacks can be supported. The CB_NULL procedure determines whether callbacks can be supported. The
CB_NULL procedure checks the continuity of the callback path. A CB_NULL procedure checks the continuity of the callback path. A
server makes a preliminary assessment of callback availability to a server makes a preliminary assessment of callback availability to a
given client and avoids delegating responsibilities until it has given client and avoids delegating responsibilities until it has
determined that callbacks are supported. Because the granting of a determined that callbacks are supported. Because the granting of a
delegation is always conditional upon the absence of conflicting delegation is always conditional upon the absence of conflicting
access, clients must not assume that a delegation will be granted and access, clients must not assume that a delegation will be granted and
they must always be prepared for OPENs to be processed without any they must always be prepared for OPENs to be processed without any
Draft Specification NFS version 4 Protocol November 2002
delegations being granted. delegations being granted.
Once granted, a delegation behaves in most ways like a lock. There Once granted, a delegation behaves in most ways like a lock. There
is an associated lease that is subject to renewal together with all is an associated lease that is subject to renewal together with all
of the other leases held by that client. of the other leases held by that client.
Unlike locks, an operation by a second client to a delegated file Unlike locks, an operation by a second client to a delegated file
will cause the server to recall a delegation through a callback. will cause the server to recall a delegation through a callback.
On recall, the client holding the delegation must flush modified On recall, the client holding the delegation must flush modified
skipping to change at page 96, line 4 skipping to change at page 96, line 21
There are three situations that delegation recovery must deal with: There are three situations that delegation recovery must deal with:
o Client reboot or restart o Client reboot or restart
o Server reboot or restart o Server reboot or restart
o Network partition (full or callback-only) o Network partition (full or callback-only)
In the event the client reboots or restarts, the failure to renew In the event the client reboots or restarts, the failure to renew
Draft Specification NFS version 4 Protocol November 2002
leases will result in the revocation of record locks and share leases will result in the revocation of record locks and share
reservations. Delegations, however, may be treated a bit reservations. Delegations, however, may be treated a bit
differently. differently.
There will be situations in which delegations will need to be There will be situations in which delegations will need to be
reestablished after a client reboots or restarts. The reason for reestablished after a client reboots or restarts. The reason for
this is the client may have file data stored locally and this data this is the client may have file data stored locally and this data
was associated with the previously held delegations. The client will was associated with the previously held delegations. The client will
need to reestablish the appropriate file state on the server. need to reestablish the appropriate file state on the server.
skipping to change at page 96, line 35 skipping to change at page 96, line 49
storage so that the delegations can be reclaimed. For open storage so that the delegations can be reclaimed. For open
delegations, such delegations are reclaimed using OPEN with a claim delegations, such delegations are reclaimed using OPEN with a claim
type of CLAIM_DELEGATE_PREV. (See the sections on "Data Caching and type of CLAIM_DELEGATE_PREV. (See the sections on "Data Caching and
Revocation" and "Operation 18: OPEN" for discussion of open Revocation" and "Operation 18: OPEN" for discussion of open
delegation and the details of OPEN respectively). delegation and the details of OPEN respectively).
A server MAY support a claim type of CLAIM_DELEGATE_PREV, but if it A server MAY support a claim type of CLAIM_DELEGATE_PREV, but if it
does, it MUST NOT remove delegations upon SETCLIENTID_CONFIRM, and does, it MUST NOT remove delegations upon SETCLIENTID_CONFIRM, and
instead MUST, for a period of time no less than that of the value of instead MUST, for a period of time no less than that of the value of
the lease_time attribute, maintain the client's delegations to allow the lease_time attribute, maintain the client's delegations to allow
time for the client to issue CLAIM_DELEGATE_PREV requests. The server time for the client to issue CLAIM_DELEGATE_PREV requests. The
that supports CLAIM_DELEGATE_PREV MUST support the DELEGPURGE server that supports CLAIM_DELEGATE_PREV MUST support the DELEGPURGE
operation. operation.
When the server reboots or restarts, delegations are reclaimed (using When the server reboots or restarts, delegations are reclaimed (using
the OPEN operation with CLAIM_PREVIOUS) in a similar fashion to the OPEN operation with CLAIM_PREVIOUS) in a similar fashion to
record locks and share reservations. However, there is a slight record locks and share reservations. However, there is a slight
semantic difference. In the normal case if the server decides that a semantic difference. In the normal case if the server decides that a
delegation should not be granted, it performs the requested action delegation should not be granted, it performs the requested action
(e.g. OPEN) without granting any delegation. For reclaim, the server (e.g., OPEN) without granting any delegation. For reclaim, the
grants the delegation but a special designation is applied so that server grants the delegation but a special designation is applied so
the client treats the delegation as having been granted but recalled that the client treats the delegation as having been granted but
by the server. Because of this, the client has the duty to write all recalled by the server. Because of this, the client has the duty to
modified state to the server and then return the delegation. This write all modified state to the server and then return the
process of handling delegation reclaim reconciles three principles of delegation. This process of handling delegation reclaim reconciles
the NFS version 4 protocol: three principles of the NFS version 4 protocol:
o Upon reclaim, a client reporting resources assigned to it by an o Upon reclaim, a client reporting resources assigned to it by an
earlier server instance must be granted those resources. earlier server instance must be granted those resources.
o The server has unquestionable authority to determine whether o The server has unquestionable authority to determine whether
delegations are to be granted and, once granted, whether they delegations are to be granted and, once granted, whether they are
are to be continued. to be continued.
Draft Specification NFS version 4 Protocol November 2002
o The use of callbacks is not to be depended upon until the client o The use of callbacks is not to be depended upon until the client
has proven its ability to receive them. has proven its ability to receive them.
When a network partition occurs, delegations are subject to freeing When a network partition occurs, delegations are subject to freeing
by the server when the lease renewal period expires. This is similar by the server when the lease renewal period expires. This is similar
to the behavior for locks and share reservations. For delegations, to the behavior for locks and share reservations. For delegations,
however, the server may extend the period in which conflicting however, the server may extend the period in which conflicting
requests are held off. Eventually the occurrence of a conflicting requests are held off. Eventually the occurrence of a conflicting
request from another client will cause revocation of the delegation. request from another client will cause revocation of the delegation.
A loss of the callback path (e.g. by later network configuration A loss of the callback path (e.g., by later network configuration
change) will have the same effect. A recall request will fail and change) will have the same effect. A recall request will fail and
revocation of the delegation will result. revocation of the delegation will result.
A client normally finds out about revocation of a delegation when it A client normally finds out about revocation of a delegation when it
uses a stateid associated with a delegation and receives the error uses a stateid associated with a delegation and receives the error
NFS4ERR_EXPIRED. It also may find out about delegation revocation NFS4ERR_EXPIRED. It also may find out about delegation revocation
after a client reboot when it attempts to reclaim a delegation and after a client reboot when it attempts to reclaim a delegation and
receives that same error. Note that in the case of a revoked write receives that same error. Note that in the case of a revoked write
open delegation, there are issues because data may have been modified open delegation, there are issues because data may have been modified
by the client whose delegation is revoked and separately by other by the client whose delegation is revoked and separately by other
skipping to change at page 98, line 4 skipping to change at page 98, line 26
protocol's data caching must be implemented such that it does not protocol's data caching must be implemented such that it does not
invalidate the assumptions that those using these facilities depend invalidate the assumptions that those using these facilities depend
upon. upon.
9.3.1. Data Caching and OPENs 9.3.1. Data Caching and OPENs
In order to avoid invalidating the sharing assumptions that In order to avoid invalidating the sharing assumptions that
applications rely on, NFS version 4 clients should not provide cached applications rely on, NFS version 4 clients should not provide cached
data to applications or modify it on behalf of an application when it data to applications or modify it on behalf of an application when it
would not be valid to obtain or modify that same data via a READ or would not be valid to obtain or modify that same data via a READ or
Draft Specification NFS version 4 Protocol November 2002
WRITE operation. WRITE operation.
Furthermore, in the absence of open delegation (see the section "Open Furthermore, in the absence of open delegation (see the section "Open
Delegation") two additional rules apply. Note that these rules are Delegation") two additional rules apply. Note that these rules are
obeyed in practice by many NFS version 2 and version 3 clients. obeyed in practice by many NFS version 2 and version 3 clients.
o First, cached data present on a client must be revalidated after o First, cached data present on a client must be revalidated after
doing an OPEN. Revalidating means that the client fetches the doing an OPEN. Revalidating means that the client fetches the
change attribute from the server, compares it with the cached change attribute from the server, compares it with the cached
change attribute, and if different, declares the cached data (as change attribute, and if different, declares the cached data (as
well as the cached attributes) as invalid. This is to ensure well as the cached attributes) as invalid. This is to ensure that
that the data for the OPENed file is still correctly reflected the data for the OPENed file is still correctly reflected in the
in the client's cache. This validation must be done at least client's cache. This validation must be done at least when the
when the client's OPEN operation includes DENY=WRITE or BOTH client's OPEN operation includes DENY=WRITE or BOTH thus
thus terminating a period in which other clients may have had terminating a period in which other clients may have had the
the opportunity to open the file with WRITE access. Clients may opportunity to open the file with WRITE access. Clients may
choose to do the revalidation more often (i.e. at OPENs choose to do the revalidation more often (i.e., at OPENs
specifying DENY=NONE) to parallel the NFS version 3 protocol's specifying DENY=NONE) to parallel the NFS version 3 protocol's
practice for the benefit of users assuming this degree of cache practice for the benefit of users assuming this degree of cache
revalidation. revalidation.
Since the change attribute is updated for data and metadata Since the change attribute is updated for data and metadata
modifications, some client implementors may be tempted to use modifications, some client implementors may be tempted to use the
the time_modify attribute and not change to validate cached time_modify attribute and not change to validate cached data, so
data, so that metadata changes do not spuriously invalidate that metadata changes do not spuriously invalidate clean data.
clean data. The implementor is cautioned in this approach. The The implementor is cautioned in this approach. The change
change attribute is guaranteed to change for each update to the attribute is guaranteed to change for each update to the file,
file, whereas time_modify is guaranteed to change only at the whereas time_modify is guaranteed to change only at the
granularity of the time_delta attribute. Use by the client's granularity of the time_delta attribute. Use by the client's data
data cache validation logic of time_modify and not change runs cache validation logic of time_modify and not change runs the risk
the risk of the client incorrectly marking stale data as valid. of the client incorrectly marking stale data as valid.
o Second, modified data must be flushed to the server before o Second, modified data must be flushed to the server before closing
closing a file OPENed for write. This is complementary to the a file OPENed for write. This is complementary to the first rule.
first rule. If the data is not flushed at CLOSE, the If the data is not flushed at CLOSE, the revalidation done after
revalidation done after client OPENs as file is unable to client OPENs as file is unable to achieve its purpose. The other
achieve its purpose. The other aspect to flushing the data aspect to flushing the data before close is that the data must be
before close is that the data must be committed to stable committed to stable storage, at the server, before the CLOSE
storage, at the server, before the CLOSE operation is requested operation is requested by the client. In the case of a server
by the client. In the case of a server reboot or restart and a reboot or restart and a CLOSEd file, it may not be possible to
CLOSEd file, it may not be possible to retransmit the data to be retransmit the data to be written to the file. Hence, this
written to the file. Hence, this requirement. requirement.
9.3.2. Data Caching and File Locking 9.3.2. Data Caching and File Locking
For those applications that choose to use file locking instead of For those applications that choose to use file locking instead of
share reservations to exclude inconsistent file access, there is an share reservations to exclude inconsistent file access, there is an
analogous set of constraints that apply to client side data caching. analogous set of constraints that apply to client side data caching.
These rules are effective only if the file locking is used in a way These rules are effective only if the file locking is used in a way
that matches in an equivalent way the actual READ and WRITE that matches in an equivalent way the actual READ and WRITE
Draft Specification NFS version 4 Protocol November 2002
operations executed. This is as opposed to file locking that is operations executed. This is as opposed to file locking that is
based on pure convention. For example, it is possible to manipulate based on pure convention. For example, it is possible to manipulate
a two-megabyte file by dividing the file into two one-megabyte a two-megabyte file by dividing the file into two one-megabyte
regions and protecting access to the two regions by file locks on regions and protecting access to the two regions by file locks on
bytes zero and one. A lock for write on byte zero of the file would bytes zero and one. A lock for write on byte zero of the file would
represent the right to do READ and WRITE operations on the first represent the right to do READ and WRITE operations on the first
region. A lock for write on byte one of the file would represent the region. A lock for write on byte one of the file would represent the
right to do READ and WRITE operations on the second region. As long right to do READ and WRITE operations on the second region. As long
as all applications manipulating the file obey this convention, they as all applications manipulating the file obey this convention, they
will work on a local filesystem. However, they may not work with the will work on a local filesystem. However, they may not work with the
NFS version 4 protocol unless clients refrain from data caching. NFS version 4 protocol unless clients refrain from data caching.
The rules for data caching in the file locking environment are: The rules for data caching in the file locking environment are:
o First, when a client obtains a file lock for a particular o First, when a client obtains a file lock for a particular region,
region, the data cache corresponding to that region (if any the data cache corresponding to that region (if any cached data
cache data exists) must be revalidated. If the change attribute exists) must be revalidated. If the change attribute indicates
indicates that the file may have been updated since the cached that the file may have been updated since the cached data was
data was obtained, the client must flush or invalidate the obtained, the client must flush or invalidate the cached data for
cached data for the newly locked region. A client might choose the newly locked region. A client might choose to invalidate all
to invalidate all of non-modified cached data that it has for of non-modified cached data that it has for the file but the only
the file but the only requirement for correct operation is to requirement for correct operation is to invalidate all of the data
invalidate all of the data in the newly locked region. in the newly locked region.
o Second, before releasing a write lock for a region, all modified o Second, before releasing a write lock for a region, all modified
data for that region must be flushed to the server. The data for that region must be flushed to the server. The modified
modified data must also be written to stable storage. data must also be written to stable storage.
Note that flushing data to the server and the invalidation of cached Note that flushing data to the server and the invalidation of cached
data must reflect the actual byte ranges locked or unlocked. data must reflect the actual byte ranges locked or unlocked.
Rounding these up or down to reflect client cache block boundaries Rounding these up or down to reflect client cache block boundaries
will cause problems if not carefully done. For example, writing a will cause problems if not carefully done. For example, writing a
modified block when only half of that block is within an area being modified block when only half of that block is within an area being
unlocked may cause invalid modification to the region outside the unlocked may cause invalid modification to the region outside the
unlocked area. This, in turn, may be part of a region locked by unlocked area. This, in turn, may be part of a region locked by
another client. Clients can avoid this situation by synchronously another client. Clients can avoid this situation by synchronously
performing portions of write operations that overlap that portion performing portions of write operations that overlap that portion
skipping to change at page 99, line 58 skipping to change at page 100, line 32
client possesses may not be valid. client possesses may not be valid.
The data that is written to the server as a prerequisite to the The data that is written to the server as a prerequisite to the
unlocking of a region must be written, at the server, to stable unlocking of a region must be written, at the server, to stable
storage. The client may accomplish this either with synchronous storage. The client may accomplish this either with synchronous
writes or by following asynchronous writes with a COMMIT operation. writes or by following asynchronous writes with a COMMIT operation.
This is required because retransmission of the modified data after a This is required because retransmission of the modified data after a
server reboot might conflict with a lock held by another client. server reboot might conflict with a lock held by another client.
A client implementation may choose to accommodate applications which A client implementation may choose to accommodate applications which
use record locking in non-standard ways (e.g. using a record lock as use record locking in non-standard ways (e.g., using a record lock as
Draft Specification NFS version 4 Protocol November 2002
a global semaphore) by flushing to the server more data upon an LOCKU a global semaphore) by flushing to the server more data upon an LOCKU
than is covered by the locked range. This may include modified data than is covered by the locked range. This may include modified data
within files other than the one for which the unlocks are being done. within files other than the one for which the unlocks are being done.
In such cases, the client must not interfere with applications whose In such cases, the client must not interfere with applications whose
READs and WRITEs are being done only within the bounds of record READs and WRITEs are being done only within the bounds of record
locks which the application holds. For example, an application locks locks which the application holds. For example, an application locks
a single byte of a file and proceeds to write that single byte. A a single byte of a file and proceeds to write that single byte. A
client that chose to handle a LOCKU by flushing all modified data to client that chose to handle a LOCKU by flushing all modified data to
the server could validly write that single byte in response to an the server could validly write that single byte in response to an
unrelated unlock. However, it would not be valid to write the entire unrelated unlock. However, it would not be valid to write the entire
skipping to change at page 101, line 4 skipping to change at page 101, line 36
NFS version 3 clients, the typical practice has been to assume for NFS version 3 clients, the typical practice has been to assume for
the purpose of caching that distinct filehandles represent distinct the purpose of caching that distinct filehandles represent distinct
filesystem objects. The client then has the choice to organize and filesystem objects. The client then has the choice to organize and
maintain the data cache on this basis. maintain the data cache on this basis.
In the NFS version 4 protocol, there is now the possibility to have In the NFS version 4 protocol, there is now the possibility to have
significant deviations from a "one filehandle per object" model significant deviations from a "one filehandle per object" model
because a filehandle may be constructed on the basis of the object's because a filehandle may be constructed on the basis of the object's
pathname. Therefore, clients need a reliable method to determine if pathname. Therefore, clients need a reliable method to determine if
two filehandles designate the same filesystem object. If clients two filehandles designate the same filesystem object. If clients
Draft Specification NFS version 4 Protocol November 2002
were simply to assume that all distinct filehandles denote distinct were simply to assume that all distinct filehandles denote distinct
objects and proceed to do data caching on this basis, caching objects and proceed to do data caching on this basis, caching
inconsistencies would arise between the distinct client side objects inconsistencies would arise between the distinct client side objects
which mapped to the same server side object. which mapped to the same server side object.
By providing a method to differentiate filehandles, the NFS version 4 By providing a method to differentiate filehandles, the NFS version 4
protocol alleviates a potential functional regression in comparison protocol alleviates a potential functional regression in comparison
with the NFS version 3 protocol. Without this method, caching with the NFS version 3 protocol. Without this method, caching
inconsistencies within the same client could occur and this has not inconsistencies within the same client could occur and this has not
been present in previous versions of the NFS protocol. Note that it been present in previous versions of the NFS protocol. Note that it
is possible to have such inconsistencies with applications executing is possible to have such inconsistencies with applications executing
on multiple clients but that is not the issue being addressed here. on multiple clients but that is not the issue being addressed here.
For the purposes of data caching, the following steps allow an NFS For the purposes of data caching, the following steps allow an NFS
version 4 client to determine whether two distinct filehandles denote version 4 client to determine whether two distinct filehandles denote
the same server side object: the same server side object:
o If GETATTR directed to two filehandles returns different values o If GETATTR directed to two filehandles returns different values of
of the fsid attribute, then the filehandles represent distinct the fsid attribute, then the filehandles represent distinct
objects. objects.
o If GETATTR for any file with an fsid that matches the fsid of o If GETATTR for any file with an fsid that matches the fsid of the
the two filehandles in question returns a unique_handles two filehandles in question returns a unique_handles attribute
attribute with a value of TRUE, then the two objects are with a value of TRUE, then the two objects are distinct.
distinct.
o If GETATTR directed to the two filehandles does not return the o If GETATTR directed to the two filehandles does not return the
fileid attribute for both of the handles, then it cannot be fileid attribute for both of the handles, then it cannot be
determined whether the two objects are the same. Therefore, determined whether the two objects are the same. Therefore,
operations which depend on that knowledge (e.g. client side data operations which depend on that knowledge (e.g., client side data
caching) cannot be done reliably. caching) cannot be done reliably.
o If GETATTR directed to the two filehandles returns different o If GETATTR directed to the two filehandles returns different
values for the fileid attribute, then they are distinct objects. values for the fileid attribute, then they are distinct objects.
o Otherwise they are the same object. o Otherwise they are the same object.
9.4. Open Delegation 9.4. Open Delegation
When a file is being OPENed, the server may delegate further handling When a file is being OPENed, the server may delegate further handling
skipping to change at page 102, line 5 skipping to change at page 102, line 38
delegation is recallable, since the circumstances that allowed for delegation is recallable, since the circumstances that allowed for
the delegation are subject to change. In particular, the server may the delegation are subject to change. In particular, the server may
receive a conflicting OPEN from another client, the server must receive a conflicting OPEN from another client, the server must
recall the delegation before deciding whether the OPEN from the other recall the delegation before deciding whether the OPEN from the other
client may be granted. Making a delegation is up to the server and client may be granted. Making a delegation is up to the server and
clients should not assume that any particular OPEN either will or clients should not assume that any particular OPEN either will or
will not result in an open delegation. The following is a typical will not result in an open delegation. The following is a typical
set of conditions that servers might use in deciding whether OPEN set of conditions that servers might use in deciding whether OPEN
should be delegated: should be delegated:
Draft Specification NFS version 4 Protocol November 2002
o The client must be able to respond to the server's callback o The client must be able to respond to the server's callback
requests. The server will use the CB_NULL procedure for a test requests. The server will use the CB_NULL procedure for a test of
of callback ability. callback ability.
o The client must have responded properly to previous recalls. o The client must have responded properly to previous recalls.
o There must be no current open conflicting with the requested o There must be no current open conflicting with the requested
delegation. delegation.
o There should be no current delegation that conflicts with the o There should be no current delegation that conflicts with the
delegation being requested. delegation being requested.
o The probability of future conflicting open requests should be o The probability of future conflicting open requests should be low
low based on the recent history of the file. based on the recent history of the file.
o The existence of any server-specific semantics of OPEN/CLOSE o The existence of any server-specific semantics of OPEN/CLOSE that
that would make the required handling incompatible with the would make the required handling incompatible with the prescribed
prescribed handling that the delegated client would apply (see handling that the delegated client would apply (see below).
below).
There are two types of open delegations, read and write. A read open There are two types of open delegations, read and write. A read open
delegation allows a client to handle, on its own, requests to open a delegation allows a client to handle, on its own, requests to open a
file for reading that do not deny read access to others. Multiple file for reading that do not deny read access to others. Multiple
read open delegations may be outstanding simultaneously and do not read open delegations may be outstanding simultaneously and do not
conflict. A write open delegation allows the client to handle, on conflict. A write open delegation allows the client to handle, on
its own, all opens. Only one write open delegation may exist for a its own, all opens. Only one write open delegation may exist for a
given file at a given time and it is inconsistent with any read open given file at a given time and it is inconsistent with any read open
delegations. delegations.
skipping to change at page 102, line 55 skipping to change at page 103, line 37
CLOSEs to the server but updates the appropriate status internally. CLOSEs to the server but updates the appropriate status internally.
For a read open delegation, opens that cannot be handled locally For a read open delegation, opens that cannot be handled locally
(opens for write or that deny read access) must be sent to the (opens for write or that deny read access) must be sent to the
server. server.
When an open delegation is made, the response to the OPEN contains an When an open delegation is made, the response to the OPEN contains an
open delegation structure which specifies the following: open delegation structure which specifies the following:
o the type of delegation (read or write) o the type of delegation (read or write)
o space limitation information to control flushing of data on o space limitation information to control flushing of data on close
close (write open delegation only, see the section "Open (write open delegation only, see the section "Open Delegation and
Delegation and Data Caching") Data Caching")
Draft Specification NFS version 4 Protocol November 2002
o an nfsace4 specifying read and write permissions o an nfsace4 specifying read and write permissions
o a stateid to represent the delegation for READ and WRITE o a stateid to represent the delegation for READ and WRITE
The delegation stateid is separate and distinct from the stateid for The delegation stateid is separate and distinct from the stateid for
the OPEN proper. The standard stateid, unlike the delegation the OPEN proper. The standard stateid, unlike the delegation
stateid, is associated with a particular lock_owner and will continue stateid, is associated with a particular lock_owner and will continue
to be valid after the delegation is recalled and the file remains to be valid after the delegation is recalled and the file remains
open. open.
When a request internal to the client is made to open a file and open When a request internal to the client is made to open a file and open
delegation is in effect, it will be accepted or rejected solely on delegation is in effect, it will be accepted or rejected solely on
the basis of the following conditions. Any requirement for other the basis of the following conditions. Any requirement for other
checks to be made by the delegate should result in open delegation checks to be made by the delegate should result in open delegation
being denied so that the checks can be made by the server itself. being denied so that the checks can be made by the server itself.
o The access and deny bits for the request and the file as o The access and deny bits for the request and the file as described
described in the section "Share Reservations". in the section "Share Reservations".
o The read and write permissions as determined below. o The read and write permissions as determined below.
The nfsace4 passed with delegation can be used to avoid frequent The nfsace4 passed with delegation can be used to avoid frequent
ACCESS calls. The permission check should be as follows: ACCESS calls. The permission check should be as follows:
o If the nfsace4 indicates that the open may be done, then it o If the nfsace4 indicates that the open may be done, then it should
should be granted without reference to the server. be granted without reference to the server.
o If the nfsace4 indicates that the open may not be done, then an o If the nfsace4 indicates that the open may not be done, then an
ACCESS request must be sent to the server to obtain the ACCESS request must be sent to the server to obtain the definitive
definitive answer. answer.
The server may return an nfsace4 that is more restrictive than the The server may return an nfsace4 that is more restrictive than the
actual ACL of the file. This includes an nfsace4 that specifies actual ACL of the file. This includes an nfsace4 that specifies
denial of all access. Note that some common practices such as denial of all access. Note that some common practices such as
mapping the traditional user "root" to the user "nobody" may make it mapping the traditional user "root" to the user "nobody" may make it
incorrect to return the actual ACL of the file in the delegation incorrect to return the actual ACL of the file in the delegation
response. response.
The use of delegation together with various other forms of caching The use of delegation together with various other forms of caching
creates the possibility that no server authentication will ever be creates the possibility that no server authentication will ever be
performed for a given user since all of the user's requests might be performed for a given user since all of the user's requests might be
satisfied locally. Where the client is depending on the server for satisfied locally. Where the client is depending on the server for
authentication, the client should be sure authentication occurs for authentication, the client should be sure authentication occurs for
each user by use of the ACCESS operation. This should be the case each user by use of the ACCESS operation. This should be the case
even if an ACCESS operation would not be required otherwise. As even if an ACCESS operation would not be required otherwise. As
mentioned before, the server may enforce frequent authentication by mentioned before, the server may enforce frequent authentication by
returning an nfsace4 denying all access with every open delegation. returning an nfsace4 denying all access with every open delegation.
Draft Specification NFS version 4 Protocol November 2002
9.4.1. Open Delegation and Data Caching 9.4.1. Open Delegation and Data Caching
OPEN delegation allows much of the message overhead associated with OPEN delegation allows much of the message overhead associated with
the opening and closing files to be eliminated. An open when an open the opening and closing files to be eliminated. An open when an open
delegation is in effect does not require that a validation message be delegation is in effect does not require that a validation message be
sent to the server. The continued endurance of the "read open sent to the server. The continued endurance of the "read open
delegation" provides a guarantee that no OPEN for write and thus no delegation" provides a guarantee that no OPEN for write and thus no
write has occurred. Similarly, when closing a file opened for write write has occurred. Similarly, when closing a file opened for write
and if write open delegation is in effect, the data written does not and if write open delegation is in effect, the data written does not
have to be flushed to the server until the open delegation is have to be flushed to the server until the open delegation is
skipping to change at page 104, line 36 skipping to change at page 105, line 23
client will force the server to recall a write open delegation. A client will force the server to recall a write open delegation. A
WRITE with a special stateid done by another client will force a WRITE with a special stateid done by another client will force a
recall of read open delegations. recall of read open delegations.
With delegations, a client is able to avoid writing data to the With delegations, a client is able to avoid writing data to the
server when the CLOSE of a file is serviced. The file close system server when the CLOSE of a file is serviced. The file close system
call is the usual point at which the client is notified of a lack of call is the usual point at which the client is notified of a lack of
stable storage for the modified file data generated by the stable storage for the modified file data generated by the
application. At the close, file data is written to the server and application. At the close, file data is written to the server and
through normal accounting the server is able to determine if the through normal accounting the server is able to determine if the
available filesystem space for the data has been exceeded (i.e. available filesystem space for the data has been exceeded (i.e.,
server returns NFS4ERR_NOSPC or NFS4ERR_DQUOT). This accounting server returns NFS4ERR_NOSPC or NFS4ERR_DQUOT). This accounting
includes quotas. The introduction of delegations requires that a includes quotas. The introduction of delegations requires that a
alternative method be in place for the same type of communication to alternative method be in place for the same type of communication to
occur between client and server. occur between client and server.
In the delegation response, the server provides either the limit of In the delegation response, the server provides either the limit of
the size of the file or the number of modified blocks and associated the size of the file or the number of modified blocks and associated
block size. The server must ensure that the client will be able to block size. The server must ensure that the client will be able to
flush data to the server of a size equal to that provided in the flush data to the server of a size equal to that provided in the
original delegation. The server must make this assurance for all original delegation. The server must make this assurance for all
skipping to change at page 105, line 5 skipping to change at page 105, line 47
The server can recall delegations as a result of managing the The server can recall delegations as a result of managing the
available filesystem space. The client should abide by the server's available filesystem space. The client should abide by the server's
state space limits for delegations. If the client exceeds the stated state space limits for delegations. If the client exceeds the stated
limits for the delegation, the server's behavior is undefined. limits for the delegation, the server's behavior is undefined.
Based on server conditions, quotas or available filesystem space, the Based on server conditions, quotas or available filesystem space, the
server may grant write open delegations with very restrictive space server may grant write open delegations with very restrictive space
limitations. The limitations may be defined in a way that will limitations. The limitations may be defined in a way that will
always force modified data to be flushed to the server on close. always force modified data to be flushed to the server on close.
Draft Specification NFS version 4 Protocol November 2002
With respect to authentication, flushing modified data to the server With respect to authentication, flushing modified data to the server
after a CLOSE has occurred may be problematic. For example, the user after a CLOSE has occurred may be problematic. For example, the user
of the application may have logged off the client and unexpired of the application may have logged off the client and unexpired
authentication credentials may not be present. In this case, the authentication credentials may not be present. In this case, the
client may need to take special care to ensure that local unexpired client may need to take special care to ensure that local unexpired
credentials will in fact be available. This may be accomplished by credentials will in fact be available. This may be accomplished by
tracking the expiration time of credentials and flushing data well in tracking the expiration time of credentials and flushing data well in
advance of their expiration or by making private copies of advance of their expiration or by making private copies of
credentials to assure their availability when needed. credentials to assure their availability when needed.
9.4.2. Open Delegation and File Locks 9.4.2. Open Delegation and File Locks
When a client holds a write open delegation, lock operations are When a client holds a write open delegation, lock operations may be
performed locally. This includes those required for mandatory file performed locally. This includes those required for mandatory file
locking. This can be done since the delegation implies that there locking. This can be done since the delegation implies that there
can be no conflicting locks. Similarly, all of the revalidations can be no conflicting locks. Similarly, all of the revalidations
that would normally be associated with obtaining locks and the that would normally be associated with obtaining locks and the
flushing of data associated with the releasing of locks need not be flushing of data associated with the releasing of locks need not be
done. done.
When a client holds a read open delegation, lock operations are not When a client holds a read open delegation, lock operations are not
performed locally. All lock operations, including those requesting performed locally. All lock operations, including those requesting
non-exclusive locks, are sent to the server for resolution. non-exclusive locks, are sent to the server for resolution.
skipping to change at page 106, line 5 skipping to change at page 106, line 50
only needs to know about this modified state. If the server only needs to know about this modified state. If the server
determines that the file is currently modified, it will respond to determines that the file is currently modified, it will respond to
the second client's GETATTR as if the file had been modified locally the second client's GETATTR as if the file had been modified locally
at the server. at the server.
Since the form of the change attribute is determined by the server Since the form of the change attribute is determined by the server
and is opaque to the client, the client and server need to agree on a and is opaque to the client, the client and server need to agree on a
method of communicating the modified state of the file. For the size method of communicating the modified state of the file. For the size
attribute, the client will report its current view of the file size. attribute, the client will report its current view of the file size.
Draft Specification NFS version 4 Protocol November 2002
For the change attribute, the handling is more involved. For the change attribute, the handling is more involved.
For the client, the following steps will be taken when receiving a For the client, the following steps will be taken when receiving a
write delegation: write delegation:
o The value of the change attribute will be obtained from the o The value of the change attribute will be obtained from the server
server and cached. Let this value be represented by c. and cached. Let this value be represented by c.
o The client will create a value greater than c that will be used o The client will create a value greater than c that will be used
for communicating modified data is held at the client. Let this for communicating modified data is held at the client. Let this
value be represented by d. value be represented by d.
o When the client is queried via CB_GETATTR for the change o When the client is queried via CB_GETATTR for the change
attribute, it checks to see if it holds modified data. If the attribute, it checks to see if it holds modified data. If the
file is modified, the value d is returned for the change file is modified, the value d is returned for the change attribute
attribute value. If this file is not currently modified, the value. If this file is not currently modified, the client returns
client returns the value c for the change attribute. the value c for the change attribute.
For simplicity of implementation, the client MAY for each CB_GETATTR For simplicity of implementation, the client MAY for each CB_GETATTR
return the same value d. This is true even if, between successive return the same value d. This is true even if, between successive
CB_GETATTR operations, the client again modifies in the file's data CB_GETATTR operations, the client again modifies in the file's data
or metadata in its cache. The client can return the same value or metadata in its cache. The client can return the same value
because the only requirement is that the client be able to indicate because the only requirement is that the client be able to indicate
to the server that the client holds modified data. Therefore, the to the server that the client holds modified data. Therefore, the
value of d may always be c + 1. value of d may always be c + 1.
While the change attribute is opaque to the client in the sense that While the change attribute is opaque to the client in the sense that
skipping to change at page 106, line 47 skipping to change at page 107, line 43
of the client's changes to that integer. Therefore, the server MUST of the client's changes to that integer. Therefore, the server MUST
encode the change attribute in network order when sending it to the encode the change attribute in network order when sending it to the
client. The client MUST decode it from network order to its native client. The client MUST decode it from network order to its native
order when receiving it and the client MUST encode it network order order when receiving it and the client MUST encode it network order
when sending it to the server. For this reason, change is defined as when sending it to the server. For this reason, change is defined as
an unsigned integer rather than an opaque array of octets. an unsigned integer rather than an opaque array of octets.
For the server, the following steps will be taken when providing a For the server, the following steps will be taken when providing a
write delegation: write delegation:
o Upon providing a write delegation, the server will cache a copy o Upon providing a write delegation, the server will cache a copy of
of the change attribute in the data structure it uses to record the change attribute in the data structure it uses to record the
the delegation. Let this value be represented by sc. delegation. Let this value be represented by sc.
o When a second client sends a GETATTR operation on the same file o When a second client sends a GETATTR operation on the same file to
to the server, the server obtains the change attribute from the the server, the server obtains the change attribute from the first
first client. Let this value be cc. client. Let this value be cc.