draft-ietf-nfsv4-nfs-rdma-problem-statement-04.txt   draft-ietf-nfsv4-nfs-rdma-problem-statement-05.txt 
INTERNET-DRAFT Tom Talpey INTERNET-DRAFT Tom Talpey
Expires: December 2006 Chet Juszczak Expires: April 2007 Chet Juszczak
June, 2006 October, 2006
NFS RDMA Problem Statement NFS RDMA Problem Statement
draft-ietf-nfsv4-nfs-rdma-problem-statement-04 draft-ietf-nfsv4-nfs-rdma-problem-statement-05
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 2, line 7 skipping to change at page 2, line 7
Table Of Contents Table Of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 2
2. Problem Statement . . . . . . . . . . . . . . . . . . . . 4 2. Problem Statement . . . . . . . . . . . . . . . . . . . . 4
3. File Protocol Architecture . . . . . . . . . . . . . . . . 5 3. File Protocol Architecture . . . . . . . . . . . . . . . . 5
4. Sources of Overhead . . . . . . . . . . . . . . . . . . . 7 4. Sources of Overhead . . . . . . . . . . . . . . . . . . . 7
4.1. Savings from TOE . . . . . . . . . . . . . . . . . . . . 8 4.1. Savings from TOE . . . . . . . . . . . . . . . . . . . . 8
4.2. Savings from RDMA . . . . . . . . . . . . . . . . . . . 9 4.2. Savings from RDMA . . . . . . . . . . . . . . . . . . . 9
5. Application of RDMA to NFS . . . . . . . . . . . . . . . . 10 5. Application of RDMA to NFS . . . . . . . . . . . . . . . . 10
6. Improved Semantics . . . . . . . . . . . . . . . . . . . . 10 6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . 10
7. Conclusions . . . . . . . . . . . . . . . . . . . . . . . 11 Security Considerations . . . . . . . . . . . . . . . . . 11
Security Considerations . . . . . . . . . . . . . . . . . 12 IANA Considerations . . . . . . . . . . . . . . . . . . . 11
IANA Considerations . . . . . . . . . . . . . . . . . . . 12 Acknowledgements . . . . . . . . . . . . . . . . . . . . . 11
Acknowledgements . . . . . . . . . . . . . . . . . . . . . 12 Normative References . . . . . . . . . . . . . . . . . . . 11
Normative References . . . . . . . . . . . . . . . . . . . 12
Informative References . . . . . . . . . . . . . . . . . . 12 Informative References . . . . . . . . . . . . . . . . . . 12
Authors' Addresses . . . . . . . . . . . . . . . . . . . . 14 Authors' Addresses . . . . . . . . . . . . . . . . . . . . 14
Intellectual Property and Copyright Statements . . . . . . 15 Intellectual Property and Copyright Statements . . . . . . 14
1. Introduction 1. Introduction
The Network File System (NFS) protocol (as described in [RFC1094], The Network File System (NFS) protocol (as described in [RFC1094],
[RFC1813], and [RFC3530]) is one of several remote file access [RFC1813], and [RFC3530]) is one of several remote file access
protocols used in the class of processing architecture sometimes protocols used in the class of processing architecture sometimes
called Network Attached Storage (NAS). called Network Attached Storage (NAS).
Historically, remote file access has proved to be a convenient, Historically, remote file access has proved to be a convenient,
cost-effective way to share information over a network, a concept cost-effective way to share information over a network, a concept
skipping to change at page 3, line 24 skipping to change at page 3, line 23
therefore force application changes. therefore force application changes.
In recent years, an effort to standardize a set of protocols for In recent years, an effort to standardize a set of protocols for
Remote Direct Memory Access, RDMA, over the standard Internet Remote Direct Memory Access, RDMA, over the standard Internet
Protocol Suite has been chartered [RDDP]. Several drafts have been Protocol Suite has been chartered [RDDP]. Several drafts have been
proposed and are being considered for Standards Track. proposed and are being considered for Standards Track.
RDMA is a general solution to the problem of CPU overhead incurred RDMA is a general solution to the problem of CPU overhead incurred
due to data copies, primarily at the receiver. Substantial due to data copies, primarily at the receiver. Substantial
research has addressed this and has borne out the efficacy of the research has addressed this and has borne out the efficacy of the
approach. An overview of this is the RDDP Problem Statement approach. An overview of this is the RDDP "Remote Direct Memory
document, [RFC4297]. Access (RDMA) over IP Problem Statement" document, [RFC4297].
In addition to the per-byte savings of off-loading data copies, In addition to the per-byte savings of off-loading data copies,
RDMA-enabled NICs (RNICS) offload the underlying protocol layers as RDMA-enabled NICs (RNICS) offload the underlying protocol layers as
well, e.g. TCP, further reducing CPU overhead due to NAS well, e.g. TCP, further reducing CPU overhead due to NAS
processing. processing.
1.1. Background 1.1. Background
The RDDP Problem Statement [RFC4297] asserts: The RDDP Problem Statement [RFC4297] asserts:
skipping to change at page 4, line 46 skipping to change at page 4, line 44
contention on network resources. contention on network resources.
2. Problem Statement 2. Problem Statement
The principal performance problem encountered by NFS The principal performance problem encountered by NFS
implementations is the CPU overhead required to implement the implementations is the CPU overhead required to implement the
protocol. Primary among the sources of this overhead is the protocol. Primary among the sources of this overhead is the
movement of data from NFS protocol messages to its eventual movement of data from NFS protocol messages to its eventual
destination in user buffers or aligned kernel buffers. Due to the destination in user buffers or aligned kernel buffers. Due to the
nature of the RPC and XDR protocols, the NFS data payload arrives nature of the RPC and XDR protocols, the NFS data payload arrives
at arbitrary alignment and the NFS requests are completed in an at arbitrary alignment, necessitating a copy at the receiver, and
arbitrary sequence. the NFS requests are completed in an arbitrary sequence.
The data copies consume system bus bandwidth and CPU time, reducing The data copies consume system bus bandwidth and CPU time, reducing
the available system capacity for applications [RFC4297]. the available system capacity for applications [RFC4297].
Achieving zero-copy with NFS has, to date, required sophisticated, Achieving zero-copy with NFS has, to date, required sophisticated,
version-specific "header cracking" hardware and/or extensive version-specific "header cracking" hardware and/or extensive
platform-specific virtual memory mapping tricks. Such approaches platform-specific virtual memory mapping tricks. Such approaches
become even more difficult for NFS version 4 due to the existence become even more difficult for NFS version 4 due to the existence
of the COMPOUND operation, which further reduces alignment and of the COMPOUND operation, which further reduces alignment and
greatly complicates ULP offload. greatly complicates ULP offload.
skipping to change at page 5, line 22 skipping to change at page 5, line 20
network I/O such as TCP is an issue at such speeds with today's network I/O such as TCP is an issue at such speeds with today's
hardware. The problem is fundamental in nature and has led the hardware. The problem is fundamental in nature and has led the
IETF to explore RDMA [RFC4297]. IETF to explore RDMA [RFC4297].
Zero-copy techniques benefit file protocols extensively, as they Zero-copy techniques benefit file protocols extensively, as they
enable direct user I/O, reduce the overhead of protocol stacks, enable direct user I/O, reduce the overhead of protocol stacks,
provide perfect alignment into caches, etc. Many studies have provide perfect alignment into caches, etc. Many studies have
already shown the performance benefits of such techniques [SKE+01] already shown the performance benefits of such techniques [SKE+01]
[DCK+03] [FJNFS] [FJDAFS] [KM02] [MAF+02]. [DCK+03] [FJNFS] [FJDAFS] [KM02] [MAF+02].
RDMA implementations generally have other interesting properties, RDMA is compelling here for another reason; hardware offloaded
such as hardware assisted protocol access, and support for user networking support in itself does not avoid data copies, without
space access to I/O. RDMA is compelling here for another reason; resorting to implementing part of the NFS protocol in the NIC.
hardware offloaded networking support in itself does not avoid data Support of RDMA by NFS enables the highest performance at the
copies, without resorting to implementing part of the NFS protocol architecture level rather than by implementation; this enables
in the NIC. Support of RDMA by NFS enables the highest performance ubiquitous and interoperable solutions.
at the architecture level rather than by implementation; this
enables ubiquitous and interoperable solutions.
By providing file access performance equivalent to that of local By providing file access performance equivalent to that of local
file systems, NFS over RDMA will enable applications running on a file systems, NFS over RDMA will enable applications running on a
set of client machines to interact through an NFS file system, just set of client machines to interact through an NFS file system, just
as applications running on a single machine might interact through as applications running on a single machine might interact through
a local file system. a local file system.
3. File Protocol Architecture 3. File Protocol Architecture
NFS runs as an ONC RPC [RFC1831] application. Being a file access NFS runs as an ONC RPC [RFC1831] application. Being a file access
skipping to change at page 10, line 45 skipping to change at page 10, line 38
of control information from data. of control information from data.
The current NFS message layout provides the performance enhancing The current NFS message layout provides the performance enhancing
opportunity for an NFS over RDMA protocol that separates the opportunity for an NFS over RDMA protocol that separates the
control information from data chunks while meeting the alignment control information from data chunks while meeting the alignment
needs of both. The data chunks can be copied "directly" between needs of both. The data chunks can be copied "directly" between
the client and server memory addresses above (with a single the client and server memory addresses above (with a single
occurrence on each memory bus) while the control information can be occurrence on each memory bus) while the control information can be
passed "inline". [RPCRDMA] describes such a protocol. passed "inline". [RPCRDMA] describes such a protocol.
6. Improved Semantics 6. Conclusions
Network file protocols need to export the application programming
interfaces and semantics that applications, especially mission
critical ones like database and clusters, have been developed to
expect. These APIs and semantics are historical in nature and
successful deprecation is doubtful. NFS has not delivered all of
the semantics (for example, reliable filesystem transactions) for
the sake of acceptable performance.
The advanced properties of RDMA-capable transports allow improved
semantics. DAFS [DCK+03] is an example of a protocol which exports
semantics which are similar to those of NFSv4, but improved in
specific areas. Improved NFS semantics can also be delivered. As
an example, [RPCRDMA] describes an implementation of RPC for RDMA
transport that is evolutionary in nature yet enables the provision
of reliable and idempotent filesystem operation. This proposal
shows that it is possible to deliver extended semantics with an
RPC/XDR layer implementation with no changes required above the NFS
layer, and few within.
7. Conclusions
NFS version 4 [RFC3530] has been granted "Proposed Standard" NFS version 4 [RFC3530] has been granted "Proposed Standard"
status. The NFSv4 protocol was developed along several design status. The NFSv4 protocol was developed along several design
points, important among them: effective operation over wide- area points, important among them: effective operation over wide- area
networks, including the Internet itself; strong security networks, including the Internet itself; strong security
integrated into the protocol; extensive cross-platform integrated into the protocol; extensive cross-platform
interoperability including integrated locking semantics compatible interoperability including integrated locking semantics compatible
with multiple operating systems; and (this is key), protocol with multiple operating systems; and (this is key), protocol
extension. extension.
skipping to change at page 12, line 5 skipping to change at page 11, line 24
It is vital that the NFS protocol continue to provide these It is vital that the NFS protocol continue to provide these
benefits to a wide range of applications, without its usefulness benefits to a wide range of applications, without its usefulness
being compromised by concerns about performance and semantic being compromised by concerns about performance and semantic
inadequacies. This can reasonably be addressed in the existing NFS inadequacies. This can reasonably be addressed in the existing NFS
protocol framework. A cautious evolutionary improvement of protocol framework. A cautious evolutionary improvement of
performance and semantics allows building on the value already performance and semantics allows building on the value already
present in the NFS protocol, while addressing new requirements that present in the NFS protocol, while addressing new requirements that
have arisen from the application of networking technology. have arisen from the application of networking technology.
8. Security Considerations 7. Security Considerations
Security Considerations are not covered by this document. Please Security Considerations are not covered by this document. Please
refer to the appropriate protocol documents for any security refer to the appropriate protocol documents for any security
issues. issues.
9. IANA Considerations 8. IANA Considerations
IANA Considerations are not covered by this document. Please refer IANA Considerations are not covered by this document. Please refer
to the appropriate protocol documents for any IANA issues. to the appropriate protocol documents for any IANA issues.
10. Acknowledgements 9. Acknowledgements
The authors wish to thank Jeff Chase who provided many useful The authors wish to thank Jeff Chase who provided many useful
suggestions. suggestions.
11. Normative References 10. Normative References
[RFC3530] [RFC3530]
S. Shepler, et. al., "NFS Version 4 Protocol", Standards Track S. Shepler, et. al., "NFS Version 4 Protocol", Standards Track
RFC RFC
[RFC1831] [RFC1831]
R. Srinivasan, "RPC: Remote Procedure Call Protocol R. Srinivasan, "RPC: Remote Procedure Call Protocol
Specification Version 2", Standards Track RFC Specification Version 2", Standards Track RFC
[RFC1832] [RFC1832]
R. Srinivasan, "XDR: External Data Representation Standard", R. Srinivasan, "XDR: External Data Representation Standard",
Standards Track RFC Standards Track RFC
[RFC1813] [RFC1813]
B. Callaghan, B. Pawlowski, P. Staubach, "NFS Version 3 B. Callaghan, B. Pawlowski, P. Staubach, "NFS Version 3
Protocol Specification", Informational RFC Protocol Specification", Informational RFC
12. Informative References 11. Informative References
[BRU99] [BRU99]
J. Brustoloni, "Interoperation of copy avoidance in network J. Brustoloni, "Interoperation of copy avoidance in network
and file I/O", in Proc. INFOCOM '99, pages 534-542, New York, and file I/O", in Proc. INFOCOM '99, pages 534-542, New York,
NY, Mar. 1999., IEEE. Also available from NY, Mar. 1999., IEEE. Also available from
http://www.cs.pitt.edu/~jcb/publs.html http://www.cs.pitt.edu/~jcb/publs.html
[CAL+03] [CAL+03]
B. Callaghan, T. Lingutla-Raj, A. Chiu, P. Staubach, O. Asad, B. Callaghan, T. Lingutla-Raj, A. Chiu, P. Staubach, O. Asad,
"NFS over RDMA", in Proceedings of ACM SIGCOMM Summer 2003 "NFS over RDMA", in Proceedings of ACM SIGCOMM Summer 2003
skipping to change at page 14, line 25 skipping to change at page 13, line 44
[PAI+00] [PAI+00]
V. S. Pai, P. Druschel, W. Zwaenepoel, "IO-Lite: a unified I/O V. S. Pai, P. Druschel, W. Zwaenepoel, "IO-Lite: a unified I/O
buffering and caching system", ACM Trans. Computer Systems, buffering and caching system", ACM Trans. Computer Systems,
18(1):37-66, Feb. 2000. 18(1):37-66, Feb. 2000.
[RDDP] [RDDP]
RDDP Working Group charter, RDDP Working Group charter,
http://www.ietf.org/html.charters/rddp-charter.html http://www.ietf.org/html.charters/rddp-charter.html
[RFC4297] [RFC4297]
Remote Direct Data Placement Working Group Problem Statement, A. Romanow, J. Mogul, T. Talpey, S. Bailey, "Remote Direct
A. Romanow, J. Mogul, T. Talpey, S. Bailey, Informational RFC Memory Access (RDMA) over IP Problem Statement", Informational
RFC
[RFC1094] [RFC1094]
Sun Microsystems, "NFS: Network File System Protocol Sun Microsystems, "NFS: Network File System Protocol
Specification" Specification"
[RPCRDMA] [RPCRDMA]
T. Talpey, B. Callaghan, "RDMA Transport for ONC RPC", T. Talpey, B. Callaghan, "RDMA Transport for ONC RPC",
Internet Draft Work in Progress, draft-ietf-nfsv4-rpcrdma Internet Draft Work in Progress, draft-ietf-nfsv4-rpcrdma
[SHI+03] [SHI+03]
P. Shivam, J. Chase, "On the Elusive Benefits of Protocol P. Shivam, J. Chase, "On the Elusive Benefits of Protocol
Offload", to be published in Proceedings of ACM SIGCOMM Summer Offload", to be published in Proceedings of ACM SIGCOMM Summer
2003 NICELI Workshop, also available from 2003 NICELI Workshop, also available from
http://issg.cs.duke.edu/publications/niceli03.pdf http://issg.cs.duke.edu/publications/niceli03.pdf
 End of changes. 16 change blocks. 
52 lines changed or deleted 28 lines changed or added

This html diff was produced by rfcdiff 1.33. The latest version is available from http://tools.ietf.org/tools/rfcdiff/