draft-ietf-nfsv4-nfs-rdma-problem-statement-08.txt   rfc5532.txt 
NFSv4 Working Group Tom Talpey Network Working Group T. Talpey
Internet-Draft NetApp Request for Comments: 5532 C. Juszczak
Intended status: Informational Chet Juszczak Network File System (NFS) Remote Direct Memory Access (RDMA)
Expires: August 23, 2008 February 21, 2008 Problem Statement
NFS RDMA Problem Statement
draft-ietf-nfsv4-nfs-rdma-problem-statement-08
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six Status of This Memo
months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-Drafts
as reference material or to cite them other than as "work in
progress."
The list of current Internet-Drafts can be accessed at This memo provides information for the Internet community. It does
http://www.ietf.org/ietf/1id-abstracts.txt not specify an Internet standard of any kind. Distribution of this
memo is unlimited.
The list of Internet-Draft Shadow Directories can be accessed at Copyright Notice
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on August 23, 2008. Copyright (c) 2009 IETF Trust and the persons identified as the
document authors. All rights reserved.
Copyright Notice This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents in effect on the date of
publication of this document (http://trustee.ietf.org/license-info).
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document.
Copyright (C) The IETF Trust (2008). This document may contain material from IETF Documents or IETF
Contributions published or made publicly available before November
10, 2008. The person(s) controlling the copyright in some of this
material may not have granted the IETF Trust the right to allow
modifications of such material outside the IETF Standards Process.
Without obtaining an adequate license from the person(s) controlling
the copyright in such materials, this document may not be modified
outside the IETF Standards Process, and derivative works of it may
not be created outside the IETF Standards Process, except to format
it for publication as an RFC or to translate it into languages other
than English.
Abstract Abstract
This draft addresses enabling the use of Remote Direct Memory This document addresses enabling the use of Remote Direct Memory
Access (RDMA) by the Network File System (NFS) protocols. NFS Access (RDMA) by the Network File System (NFS) protocols. NFS
implementations historically incur significant overhead due to data implementations historically incur significant overhead due to data
copies on end-host systems, as well as other processing overhead. copies on end-host systems, as well as other processing overhead.
The potential benefits of RDMA to these implementations are This document explores the potential benefits of RDMA to these
explored, and the reasons why RDMA is especially well-suited to NFS implementations and evaluates the reasons why RDMA is especially
and network file protocols in general are evaluated. well-suited to NFS and network file protocols in general.
Table Of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction ....................................................2
2. Problem Statement . . . . . . . . . . . . . . . . . . . . 5 1.1. Background .................................................3
3. File Protocol Architecture . . . . . . . . . . . . . . . . 6 2. Problem Statement ...............................................4
4. Sources of Overhead . . . . . . . . . . . . . . . . . . . 8 3. File Protocol Architecture ......................................5
4.1. Savings from TOE . . . . . . . . . . . . . . . . . . . . 9 4. Sources of Overhead .............................................7
4.2. Savings from RDMA . . . . . . . . . . . . . . . . . . . 10 4.1. Savings from TOE ...........................................8
5. Application of RDMA to NFS . . . . . . . . . . . . . . . . 10 4.2. Savings from RDMA ..........................................9
6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . 11 5. Application of RDMA to NFS .....................................10
Security Considerations . . . . . . . . . . . . . . . . . 12 6. Conclusions ....................................................10
IANA Considerations . . . . . . . . . . . . . . . . . . . 13 7. Security Considerations ........................................11
Acknowledgements . . . . . . . . . . . . . . . . . . . . . 13 8. Acknowledgments ................................................12
Normative References . . . . . . . . . . . . . . . . . . . 13 9. References .....................................................12
Informative References . . . . . . . . . . . . . . . . . . 13 9.1. Normative References ......................................12
Authors' Addresses . . . . . . . . . . . . . . . . . . . . 16 9.2. Informative References ....................................13
Intellectual Property and Copyright Statements . . . . . . 16
Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . 17
1. Introduction 1. Introduction
The Network File System (NFS) protocol (as described in [RFC1094], The Network File System (NFS) protocol (as described in [RFC1094],
[RFC1813], and [RFC3530]) is one of several remote file access [RFC1813], and [RFC3530]) is one of several remote file access
protocols used in the class of processing architecture sometimes protocols used in the class of processing architecture sometimes
called Network Attached Storage (NAS). called Network-Attached Storage (NAS).
Historically, remote file access has proven to be a convenient, Historically, remote file access has proven to be a convenient,
cost-effective way to share information over a network, a concept cost-effective way to share information over a network, a concept
proven over time by the popularity of the NFS protocol. However, proven over time by the popularity of the NFS protocol. However,
there are issues in such a deployment. there are issues in such a deployment.
As compared to a local (direct-attached) file access architecture, As compared to a local (direct-attached) file access architecture,
NFS removes the overhead of managing the local on-disk filesystem NFS removes the overhead of managing the local on-disk filesystem
state and its metadata, but interposes at least a transport network state and its metadata, but interposes at least a transport network
and two network endpoints between an application process and the and two network endpoints between an application process and the
files it is accessing. This tradeoff has to date usually resulted files it is accessing. To date, this trade-off has usually resulted
in a net performance loss as a result of reduced bandwidth, in a net performance loss as a result of reduced bandwidth, increased
increased application server CPU utilization, and other overheads. application server CPU utilization, and other overheads.
Several classes of applications, including those directly Several classes of applications, including those directly supporting
supporting enterprise activities in high performance domains such enterprise activities in high-performance domains such as database
as database applications and shared clusters, have therefore applications and shared clusters, have therefore encountered issues
encountered issues with moving to NFS architectures. While this with moving to NFS architectures. While this has been due
has been due principally to the performance costs of NFS versus principally to the performance costs of NFS versus direct-attached
direct attached files, other reasons are relevant, such as the lack files, other reasons are relevant, such as the lack of strong
of strong consistency guarantees being provided by NFS consistency guarantees being provided by NFS implementations.
implementations.
Replication of local file access performance on NAS using Replication of local file access performance on NAS using traditional
traditional network protocol stacks has proven difficult, not network protocol stacks has proven difficult, not because of protocol
because of protocol processing overheads, but because of data copy processing overheads, but because of data copy costs in the network
costs in the network endpoints. This is especially true since host endpoints. This is especially true since host buses are now often
buses are now often the main bottleneck in NAS architectures the main bottleneck in NAS architectures [MOG03] [CHA+01].
[MOG03] [CHA+01].
The External Data Representation [RFC4506] employed beneath NFS and The External Data Representation [RFC4506] employed beneath NFS and
RPC [RFC1831bis] can add more data copies, exacerbating the the Remote Procedure Call (RPC) [RFC5531] can add more data copies,
problem. exacerbating the problem.
Data copy-avoidance designs have not been widely adopted for a Data copy-avoidance designs have not been widely adopted for a
variety of reasons. [BRU99] points out that "many copy avoidance variety of reasons. [BRU99] points out that "many copy avoidance
techniques for network I/O are not applicable or may even backfire techniques for network I/O are not applicable or may even backfire if
if applied to file I/O." Other designs that eliminate unnecessary applied to file I/O". Other designs that eliminate unnecessary
copies, such as [PAI+00], are incompatible with existing APIs and copies, such as [PAI+00], are incompatible with existing APIs and
therefore force application changes. therefore force application changes.
In recent years, an effort to standardize a set of protocols for In recent years, an effort to standardize a set of protocols for
Remote Direct Memory Access, RDMA, over the standard Internet Remote Direct Memory Access (RDMA) over the standard Internet
Protocol Suite has been chartered [RDDP]. A complete IP-based RDMA Protocol Suite has been chartered [RDDP]. A complete IP-based RDMA
procotol suite is available in the published Standards Track protocol suite is available in the published Standards Track
specifications. specifications.
RDMA is a general solution to the problem of CPU overhead incurred RDMA is a general solution to the problem of CPU overhead incurred
due to data copies, primarily at the receiver. Substantial due to data copies, primarily at the receiver. Substantial research
research has addressed this and has borne out the efficacy of the has addressed this and has borne out the efficacy of the approach.
approach. An overview of this is the RDDP "Remote Direct Memory An overview of this is the "Remote Direct Memory Access (RDMA) over
Access (RDMA) over IP Problem Statement" document, [RFC4297]. IP Problem Statement" [RFC4297].
In addition to the per-byte savings of off-loading data copies, In addition to the per-byte savings of offloading data copies, RDMA-
RDMA-enabled NICs (RNICS) offload the underlying protocol layers as enabled NICs (RNICS) offload the underlying protocol layers as well
well, e.g., TCP, further reducing CPU overhead due to NAS (e.g., TCP), further reducing CPU overhead due to NAS processing.
processing.
1.1. Background 1.1. Background
The RDDP Problem Statement [RFC4297] asserts: The RDDP Problem Statement [RFC4297] asserts:
"High costs associated with copying are an issue primarily for High costs associated with copying are an issue primarily for
large scale systems ... with high bandwidth feeds, usually large scale systems ... with high bandwidth feeds, usually
multiprocessors and clusters, that are adversely affected by multiprocessors and clusters, that are adversely affected by
copying overhead. Examples of such machines include all copying overhead. Examples of such machines include all varieties
varieties of servers: database servers, storage servers, of servers: database servers, storage servers, application servers
application servers for transaction processing, for e- for transaction processing, for e-commerce, and web serving,
commerce, and web serving, content distribution, video content distribution, video distribution, backups, data mining and
distribution, backups, data mining and decision support, and decision support, and scientific computing.
scientific computing.
Note that such servers almost exclusively service many Note that such servers almost exclusively service many concurrent
concurrent sessions (transport connections), which, in sessions (transport connections), which, in aggregate, are
aggregate, are responsible for > 1 Gbits/s of communication. responsible for > 1 Gbits/s of communication. Nonetheless, the
Nonetheless, the cost of copying overhead for a particular cost of copying overhead for a particular load is the same whether
load is the same whether from few or many sessions." from few or many sessions.
Note that each of the servers listed above could be accessing their Note that each of the servers listed above could be accessing their
file data as an NFS client, or NFS serving the data to such file data as an NFS client, or as NFS serving the data to such
clients, or acting as both. clients, or acting as both.
The CPU overhead of the NFS and TCP/IP protocol stacks (including The CPU overhead of the NFS and TCP/IP protocol stacks (including
data copies or reduced copy workarounds) becomes a significant data copies or reduced copy workarounds) becomes a significant matter
matter in these clients and servers. File access using locally in these clients and servers. File access using locally attached
attached disks imposes relatively low overhead due to the highly disks imposes relatively low overhead due to the highly optimized I/O
optimized I/O path and direct memory access afforded to the storage path and direct memory access afforded to the storage controller.
controller. This is not the case with NFS, which must pass data This is not the case with NFS, which must pass data to, and
to, and especially from, the network and network processing stack especially from, the network and network processing stack to the NFS
to the NFS stack. Frequently, data copies are imposed on this stack. Frequently, data copies are imposed on this transfer; in some
transfer, in some cases several such copies in each direction. cases, several such copies are imposed in each direction.
Copies are potentially encountered in an NFS implementation Copies are potentially encountered in an NFS implementation
exchanging data to and from user address spaces, within kernel exchanging data to and from user address spaces, within kernel buffer
buffer caches, in XDR marshalling and unmarshalling, and within caches, in eXternal Data Representation (XDR) marshalling and
network stacks and network drivers. Other overheads such as unmarshalling, and within network stacks and network drivers. Other
serialization among multiple threads of execution sharing a single overheads such as serialization among multiple threads of execution
NFS mount point and transport connection are additionally sharing a single NFS mount point and transport connection are
encountered. additionally encountered.
Numerous upper layer protocols achieve extremely high bandwidth and Numerous upper-layer protocols achieve extremely high bandwidth and
low overhead through the use of RDMA. [MAF+02] show that the RDMA- low overhead through the use of RDMA. [MAF+02] shows that the RDMA-
based Direct Access File System (with a user-level implementation based Direct Access File System (with a user-level implementation of
of the file system client) can outperform even a zero-copy the file system client) can outperform even a zero-copy
implementation of NFS [CHA+01] [CHA+99] [GAL+99] [KM02]. Also, implementation of NFS [CHA+01] [CHA+99] [GAL+99] [KM02]. Also, file
file data access implies the use of large ULP messages. These data access implies the use of large Unequal Loss Protection (ULP)
large messages tend to amortize any increase in per-message costs messages. These large messages tend to amortize any increase in
due to the offload of protocol processing incurred when using RNICs per-message costs due to the offload of protocol processing incurred
while gaining the benefits of reduced per-byte costs. Finally, the when using RNICs while gaining the benefits of reduced per-byte
direct memory addressing afforded by RDMA avoids many sources of costs. Finally, the direct memory addressing afforded by RDMA avoids
contention on network resources. many sources of contention on network resources.
2. Problem Statement 2. Problem Statement
The principal performance problem encountered by NFS The principal performance problem encountered by NFS implementations
implementations is the CPU overhead required to implement the is the CPU overhead required to implement the protocol. Primary
protocol. Primary among the sources of this overhead is the among the sources of this overhead is the movement of data from NFS
movement of data from NFS protocol messages to its eventual protocol messages to its eventual destination in user buffers or
destination in user buffers or aligned kernel buffers. Due to the aligned kernel buffers. Due to the nature of the RPC and XDR
nature of the RPC and XDR protocols, the NFS data payload arrives protocols, the NFS data payload arrives at arbitrary alignment,
at arbitrary alignment, necessitating a copy at the receiver, and necessitating a copy at the receiver, and the NFS requests are
the NFS requests are completed in an arbitrary sequence. completed in an arbitrary sequence.
The data copies consume system bus bandwidth and CPU time, reducing The data copies consume system bus bandwidth and CPU time, reducing
the available system capacity for applications [RFC4297]. the available system capacity for applications [RFC4297]. To date,
Achieving zero-copy with NFS has, to date, required sophisticated, achieving zero-copy with NFS has required sophisticated, version-
version-specific "header cracking" hardware and/or extensive specific "header cracking" hardware and/or extensive platform-
platform-specific virtual memory mapping tricks. Such approaches specific virtual memory mapping tricks. Such approaches become even
become even more difficult for NFS version 4 due to the existence more difficult for NFS version 4 due to the existence of the COMPOUND
of the COMPOUND operation and presence of Kerberos and other operation and presence of Kerberos and other security information,
security information, which further reduce alignment and greatly which further reduce alignment and greatly complicate ULP offload.
complicate ULP offload.
Furthermore, NFS is challenged by high-speed network fabrics such Furthermore, NFS is challenged by high-speed network fabrics such as
as 10 Gbits/s Ethernet. Performing even raw network I/O such as 10 Gbits/s Ethernet. Performing even raw network I/O such as TCP is
TCP is an issue at such speeds with today's hardware. The problem an issue at such speeds with today's hardware. The problem is
is fundamental in nature and has led the IETF to explore RDMA fundamental in nature and has led the IETF to explore RDMA [RFC4297].
[RFC4297].
Zero-copy techniques benefit file protocols extensively, as they Zero-copy techniques benefit file protocols extensively, as they
enable direct user I/O, reduce the overhead of protocol stacks, enable direct user I/O, reduce the overhead of protocol stacks,
provide perfect alignment into caches, etc. Many studies have provide perfect alignment into caches, etc. Many studies have
already shown the performance benefits of such techniques [SKE+01] already shown the performance benefits of such techniques [SKE+01]
[DCK+03] [FJNFS] [FJDAFS] [KM02] [MAF+02]. [DCK+03] [FJNFS] [FJDAFS] [KM02] [MAF+02].
RDMA is compelling here for another reason; hardware offloaded RDMA is compelling here for another reason; hardware-offloaded
networking support in itself does not avoid data copies, without networking support in itself does not avoid data copies, without
resorting to implementing part of the NFS protocol in the NIC. resorting to implementing part of the NFS protocol in the Network
Support of RDMA by NFS enables the highest performance at the Interface Card (NIC). Support of RDMA by NFS enables the highest
architecture level rather than by implementation; this enables performance at the architecture level rather than by implementation;
ubiquitous and interoperable solutions. this enables ubiquitous and interoperable solutions.
By providing file access performance equivalent to that of local By providing file access performance equivalent to that of local file
file systems, NFS over RDMA will enable applications running on a systems, NFS over RDMA will enable applications running on a set of
set of client machines to interact through an NFS file system, just client machines to interact through an NFS file system, just as
as applications running on a single machine might interact through applications running on a single machine might interact through a
a local file system. local file system.
3. File Protocol Architecture 3. File Protocol Architecture
NFS runs as an ONC RPC [RFC1831bis] application. Being a file NFS runs as an Open Network Computing (ONC) RPC [RFC5531]
access protocol, NFS is very "rich" in data content (versus control application. Being a file access protocol, NFS is very "rich" in
information). data content (versus control information).
NFS messages can range from very small (under 100 bytes) to very NFS messages can range from very small (under 100 bytes) to very
large (from many kilobytes to a megabyte or more). They are all large (from many kilobytes to a megabyte or more). They are all
contained within an RPC message and follow a variable length RPC contained within an RPC message and follow a variable-length RPC
header. This layout provides an alignment challenge for the data header. This layout provides an alignment challenge for the data
items contained in an NFS call (request) or reply (response) items contained in an NFS call (request) or reply (response) message.
message.
In addition to the control information in each NFS call or reply In addition to the control information in each NFS call or reply
message, sometimes there are large "chunks" of application file message, sometimes there are large "chunks" of application file data,
data, for example read and write requests. With NFS version 4 (due for example, read and write requests. With NFS version 4 (due to the
to the existence of the COMPOUND operation) there can be several of existence of the COMPOUND operation), there can be several of these
these data chunks interspersed with control information. data chunks interspersed with control information.
ONC RPC is a remote procedure call protocol that has been run over ONC RPC is a remote procedure call protocol that has been run over a
a variety of transports. Most implementations today use UDP or variety of transports. Most implementations today use UDP or TCP.
TCP. RPC messages are defined in terms of an eXternal Data RPC messages are defined in terms of an eXternal Data Representation
Representation (XDR) [RFC4506] which provides a canonical data (XDR) [RFC4506], which provides a canonical data representation
representation across a variety of host architectures. An XDR data across a variety of host architectures. An XDR data stream is
stream is conveyed differently on each type of transport. On UDP, conveyed differently on each type of transport. On UDP, RPC messages
RPC messages are encapsulated inside datagrams, while on a TCP byte are encapsulated inside datagrams, while on a TCP byte stream, RPC
stream, RPC messages are delineated by a record marking protocol. messages are delineated by a record-marking protocol. An RDMA
An RDMA transport also conveys RPC messages in a unique fashion transport also conveys RPC messages in a unique fashion that must be
that must be fully described if client and server implementations fully described if client and server implementations are to
are to interoperate. interoperate.
The RPC transport is responsible for conveying an RPC message from The RPC transport is responsible for conveying an RPC message from a
a sender to a receiver. An RPC message is either an RPC call from sender to a receiver. An RPC message is either an RPC call from a
a client to a server, or an RPC reply from the server back to the client to a server, or an RPC reply from the server back to the
client. An RPC message contains an RPC call header followed by client. An RPC message contains an RPC call header followed by
arguments if the message is an RPC call, or an RPC reply header arguments if the message is an RPC call, or an RPC reply header
followed by results if the message is an RPC reply. The call followed by results if the message is an RPC reply. The call header
header contains a transaction ID (XID) followed by the program and contains a transaction ID (XID) followed by the program and procedure
procedure number as well as a security credential. An RPC reply number as well as a security credential. An RPC reply header begins
header begins with an XID that matches that of the RPC call with an XID that matches that of the RPC call message, followed by a
message, followed by a security verifier and results. All data in security verifier and results. All data in an RPC message is XDR
an RPC message is XDR encoded. encoded.
The encoding of XDR data into transport buffers is referred to as The encoding of XDR data into transport buffers is referred to as
"marshalling", and the decoding of XDR data contained within "marshalling", and the decoding of XDR data contained within
transport buffers and into destination RPC procedure result transport buffers and into destination RPC procedure result buffers,
buffers, is referred to as "unmarshalling". The process of is referred to as "unmarshalling". Therefore, the process of
marshalling takes place therefore at the sender of any particular marshalling takes place at the sender of any particular message, be
message, be it an RPC request or an RPC response. Unmarshalling, it an RPC request or an RPC response. Unmarshalling, of course,
of course, takes place at the receiver. takes place at the receiver.
Normally, any bulk data is moved (copied) as a result of the Normally, any bulk data is moved (copied) as a result of the
unmarshalling process, because the destination address is not known unmarshalling process, because the destination address is not known
until the RPC code receives control and subsequently invokes the until the RPC code receives control and subsequently invokes the XDR
XDR unmarshalling routine. In other words, XDR-encoded data is not unmarshalling routine. In other words, XDR-encoded data is not
self-describing, and it carries no placement information. This self-describing, and it carries no placement information. This
results in a data copy in most NFS implementations. results in a data copy in most NFS implementations.
One mechanism by which the RPC layer may overcome this is for each One mechanism by which the RPC layer may overcome this is for each
request to include placement information, to be used for direct request to include placement information, to be used for direct
placement during XDR encode. This "write chunk" can avoid sending placement during XDR encode. This "write chunk" can avoid sending
bulk data inline in an RPC message and generally results in one or bulk data inline in an RPC message and generally results in one or
more RDMA Write operations. more RDMA Write operations.
Similarly, a "read chunk", where placement information referring to Similarly, a "read chunk", where placement information referring to
bulk data which may be directly fetched via one or more RDMA Read bulk data that may be directly fetched via one or more RDMA Read
operations during XDR decode, may be conveyed. The "read chunk" operations during XDR decode, may be conveyed. The "read chunk" will
will therefore be useful in both RPC calls and replies, while the therefore be useful in both RPC calls and replies, while the "write
"write chunk" is used solely in replies. chunk" is used solely in replies.
These "chunks" are the key concept in an existing proposal These "chunks" are the key concept in an existing proposal [RPCRDMA].
[RPCRDMA]. They convey what are effectively pointers to remote They convey what are effectively pointers to remote memory across the
memory across the network. They allow cooperating peers to network. They allow cooperating peers to exchange data outside of
exchange data outside of XDR encodings but still use XDR for XDR encodings but still use XDR for describing the data to be
describing the data to be transferred. And, finally, through use transferred. And, finally, through use of XDR they maintain a large
of XDR they maintain a large degree of on-the-wire compatibility. degree of on-the-wire compatibility.
The central concept of the RDMA transport is to provide the The central concept of the RDMA transport is to provide the
additional encoding conventions to convey this placement additional encoding conventions to convey this placement information
information in transport-specific encoding, and to modify the XDR in transport-specific encoding, and to modify the XDR handling of
handling of bulk data. bulk data.
Block Diagram Block Diagram
+------------------------+-----------------------------------+ +------------------------+-----------------------------------+
| NFS | NFS + RDMA | | NFS | NFS + RDMA |
+------------------------+----------------------+------------+ +------------------------+----------------------+------------+
| Operations / Procedures | | | Operations / Procedures | |
+-----------------------------------------------+ | +-----------------------------------------------+ |
| RPC/XDR | | | RPC/XDR | |
+--------------------------------+--------------+ | +--------------------------------+--------------+ |
| Stream Transport | RDMA Transport | | Stream Transport | RDMA Transport |
+--------------------------------+---------------------------+ +--------------------------------+---------------------------+
4. Sources of Overhead 4. Sources of Overhead
Network and file protocol costs can be categorized as follows: Network and file protocol costs can be categorized as follows:
o per-byte costs - data touching costs such as checksum or data o per-byte costs - data touching costs such as checksum or data
copy. Today's network interface hardware commonly offloads copy. Today's network interface hardware commonly offloads the
the checksum, which leaves the other major source of per-byte checksum, which leaves the other major source of per-byte
overhead, data copy. overhead, data copy.
o per-packet costs - interrupts and lower-layer processing. o per-packet costs - interrupts and lower-layer processing (LLP).
Today's network interface hardware also commonly coalesce Today's network interface hardware also commonly coalesce
interrupts to reduce per-packet costs. interrupts to reduce per-packet costs.
o per-message (request or response) costs - LLP and ULP o per-message (request or response) costs - LLP and ULP processing.
processing.
Improvement from optimization becomes more important if the Improvement from optimization becomes more important if the overhead
overhead it targets is a larger share of the total cost. As other it targets is a larger share of the total cost. As other sources of
sources of overhead, such as the checksumming and interrupt overhead, such as the checksumming and interrupt handling above are
handling above are eliminated, the remaining overheads (primarily eliminated, the remaining overheads (primarily data copy) loom
data copy) loom larger. larger.
With copies crossing the bus twice per copy, network processing With copies crossing the bus twice per copy, network processing
overhead is high whenever network bandwidth is large in comparison overhead is high whenever network bandwidth is large in comparison to
to CPU and memory bandwidths. Generally with today's end-systems, CPU and memory bandwidths. Generally, with today's end-systems, the
the effects are observable at network speeds at or above 1 Gbits/s. effects are observable at network speeds at or above 1 Gbit/s.
A common question is whether an increase in CPU processing power A common question is whether an increase in CPU processing power
alleviates the problem of high processing costs of network I/O. alleviates the problem of high processing costs of network I/O. The
The answer is no, it is the memory bandwidth that is the issue. answer is no, it is the memory bandwidth that is the issue. Faster
Faster CPUs do not help if the CPU spends most of its time waiting CPUs do not help if the CPU spends most of its time waiting for
for memory [RFC4297]. memory [RFC4297].
TCP offload engine (TOE) technology aims to offload the CPU by TCP offload engine (TOE) technology aims to offload the CPU by moving
moving TCP/IP protocol processing to the NIC. However, TOE TCP/IP protocol processing to the NIC. However, TOE technology by
technology by itself does nothing to avoid necessary data copies itself does nothing to avoid necessary data copies within upper-layer
within upper layer protocols. [MOG03] provides a description of protocols. [MOG03] provides a description of the role TOE can play
the role TOE can play in reducing per-packet and per-message costs. in reducing per-packet and per-message costs. Beyond the offloads
Beyond the offloads commonly provided by today's network interface commonly provided by today's network interface hardware, TOE alone
hardware, TOE alone (w/o RDMA) helps in protocol header processing, (without RDMA) helps in protocol header processing, but this has been
but this has been shown to be a minority component of the total shown to be a minority component of the total protocol processing
protocol processing overhead. [CHA+01] overhead. [CHA+01]
Numerous software approaches to the optimization of network Numerous software approaches to the optimization of network
throughput have been made. Experience has shown that network I/O throughput have been made. Experience has shown that network I/O
interacts with other aspects of system processing such as file I/O interacts with other aspects of system processing such as file I/O
and disk I/O. [BRU99] [CHU96] Zero-copy optimizations based on and disk I/O [BRU99] [CHU96]. Zero-copy optimizations based on page
page remapping [CHU96] can be dependent upon machine architecture, remapping [CHU96] can be dependent upon machine architecture, and are
and are not scalable to multi-processor architectures. Correct not scalable to multi-processor architectures. Correct buffer
buffer alignment and sizing together are needed to optimize the alignment and sizing together are needed to optimize the performance
performance of zero-copy movement mechanisms [SKE+01]. The NFS of zero-copy movement mechanisms [SKE+01]. The NFS message layout
message layout described above does not facilitate the splitting of described above does not facilitate the splitting of headers from
headers from data nor does it facilitate providing correct data data nor does it facilitate providing correct data buffer alignment.
buffer alignment.
4.1. Savings from TOE 4.1. Savings from TOE
The expected improvement of TOE specifically for NFS protocol The expected improvement of TOE specifically for NFS protocol
processing can be quantified and shown to be fundamentally limited. processing can be quantified and shown to be fundamentally limited.
[SHI+03] presents a set of "LAWS" parameters which serve to [SHI+03] presents a set of "LAWS" parameters that serve to illustrate
illustrate the issues. In the TOE case, the copy cost can be the issues. In the TOE case, the copy cost can be viewed as part of
viewed as part of the application processing "a". Application the application processing "a". Application processing increases the
processing increases the LAWS "gamma", which is shown by the paper LAWS "gamma", which is shown by the paper to result in a diminished
to result in a diminished benefit for TOE. benefit for TOE.
For example, if the overhead is 20% TCP/IP, 30% copy and 50% real For example, if the overhead is 20% TCP/IP, 30% copy, and 50% real
application work, then gamma is 80/20 or 4, which means the maximum application work, then gamma is 80/20 or 4, which means the maximum
benefit of TOE is 1/gamma, or only 25%. benefit of TOE is 1/gamma, or only 25%.
For RDMA (with embedded TOE) and the same example, the "overhead" For RDMA (with embedded TOE) and the same example, the "overhead" (o)
(o) offloaded or eliminated is 50% (20%+30%). Therefore in the offloaded or eliminated is 50% (20% + 30%). Therefore, in the RDMA
RDMA case, gamma is 50/50 or 1, and the inverse gives the potential case, gamma is 50/50 or 1, and the inverse gives the potential
benefit of 1 (100%), a factor of two. benefit of 1 (100%), a factor of two.
CPU overhead reduction factor CPU Overhead Reduction Factor
No Offload TCP Offload RDMA Offload No Offload TCP Offload RDMA Offload
-----------+-------------+------------- -----------+-------------+-------------
1.00x 1.25x 2.00x 1.00x 1.25x 2.00x
The analysis in the paper shows that RDMA could improve throughput The analysis in the paper shows that RDMA could improve throughput by
by the same factor of two, even when the host is (just) powerful the same factor of two, even when the host is (just) powerful enough
enough to drive the full network bandwidth without RDMA. It can to drive the full network bandwidth without RDMA. It can also be
also be shown that the speedup may be higher if network bandwidth shown that the speedup may be higher if network bandwidth grows
grows faster than Moore's Law, although the higher benefits will faster than Moore's Law, although the higher benefits will apply to a
apply to a narrow range of applications. narrow range of applications.
4.2. Savings from RDMA 4.2. Savings from RDMA
Performance measurements directly comparing an NFS over RDMA Performance measurements directly comparing an NFS-over-RDMA
prototype with conventional network-based NFS processing are prototype with conventional network-based NFS processing are
described in [CAL+03]. Comparisons of Read throughput and CPU described in [CAL+03]. Comparisons of Read throughput and CPU
overhead were performed on two types of Gigabit Ethernet adapters, overhead were performed on two types of Gigabit Ethernet adapters,
one type being a conventional adapter, and another type with RDMA one type being a conventional adapter, and another type with RDMA
capability. The prototype RDMA protocol performed all transfers capability. The prototype RDMA protocol performed all transfers via
via RDMA Read. The NFS layer in the study was measured while RDMA Read. The NFS layer in the study was measured while performing
performing read transfers, varying the transfer size and readahead read transfers, varying the transfer size and readahead depth across
depth across ranges used by typical NFS deployments. ranges used by typical NFS deployments.
In these results, conventional network-based throughput was In these results, conventional network-based throughput was severely
severely limited by the client's CPU being saturated at 100% for limited by the client's CPU being saturated at 100% for all
all transfers. Read throughput reached no more than 60MBytes/s. transfers. Read throughput reached no more than 60 MBytes/s.
I/O Type Size Read Throughput CPU Utilization I/O Type Size Read Throughput CPU Utilization
Conventional 2KB 20MB/s 100% Conventional 2KB 20MB/s 100%
Conventional 16KB 40MB/s 100% Conventional 16KB 40MB/s 100%
Conventional 256KB 60MB/s 100% Conventional 256KB 60MB/s 100%
However, over RDMA, throughput rose to the theoretical maximum However, over RDMA, throughput rose to the theoretical maximum
throughput of the platform, while saturating the single-CPU system throughput of the platform, while saturating the single-CPU system
only at maximum throughput. only at maximum throughput.
I/O Type Size Read Throughput CPU Utilization I/O Type Size Read Throughput CPU Utilization
RDMA 2KB 10MB/s 45% RDMA 2KB 10MB/s 45%
RDMA 16KB 40MB/s 70% RDMA 16KB 40MB/s 70%
RDMA 256KB 100MB/s 100% RDMA 256KB 100MB/s 100%
The lower relative throughput of the RDMA prototype at the small The lower relative throughput of the RDMA prototype at the small
blocksize may be attributable to the RDMA Read imposed by the blocksize may be attributable to the RDMA Read imposed by the
prototype protocol, which reduced the operation rate since it prototype protocol, which reduced the operation rate since it
introduces additional latency. As well, it may reflect the introduces additional latency. As well, it may reflect the relative
relative increase of per-packet setup costs within the DMA portion increase of per-packet setup costs within the DMA portion of the
of the transfer. transfer.
5. Application of RDMA to NFS 5. Application of RDMA to NFS
Efficient file protocols require efficient data positioning and Efficient file protocols require efficient data positioning and
movement. The client system knows the client memory address where movement. The client system knows the client memory address where
the application has data to be written or wants read data the application has data to be written or wants read data deposited.
deposited. The server system knows the server memory address where The server system knows the server memory address where the local
the local filesystem will accept write data or has data to be read. file system will accept write data or has data to be read. Neither
Neither peer however is aware of the others' data destination in peer however is aware of the others' data destination in the current
the current NFS, RPC or XDR protocols. Existing NFS NFS, RPC, or XDR protocols. Existing NFS implementations have
implementations have struggled with the performance costs of data struggled with the performance costs of data copies when using
copies when using traditional Ethernet transports. traditional Ethernet transports.
With the onset of faster networks, the network I/O bottleneck will With the onset of faster networks, the network I/O bottleneck will
worsen. Fortunately, new transports that support RDMA have worsen. Fortunately, new transports that support RDMA have emerged.
emerged. RDMA excels at bulk transfer efficiency; it is an RDMA excels at bulk transfer efficiency; it is an efficient way to
efficient way to deliver direct data placement and remove a major deliver direct data placement and remove a major part of the problem:
part of the problem: data copies. RDMA also addresses other data copies. RDMA also addresses other overheads, e.g., underlying
overheads, e.g., underlying protocol offload, and offers separation protocol offload, and offers separation of control information from
of control information from data. data.
The current NFS message layout provides the performance enhancing The current NFS message layout provides the performance-enhancing
opportunity for an NFS over RDMA protocol that separates the opportunity for an NFS-over-RDMA protocol that separates the control
control information from data chunks while meeting the alignment information from data chunks while meeting the alignment needs of
needs of both. The data chunks can be copied "directly" between both. The data chunks can be copied "directly" between the client
the client and server memory addresses above (with a single and server memory addresses above (with a single occurrence on each
occurrence on each memory bus) while the control information can be memory bus) while the control information can be passed "inline".
passed "inline". [RPCRDMA] describes such a protocol. [RPCRDMA] describes such a protocol.
6. Conclusions 6. Conclusions
NFS version 4 [RFC3530] has been granted "Proposed Standard" NFS version 4 [RFC3530] has been granted "Proposed Standard" status.
status. The NFSv4 protocol was developed along several design The NFSv4 protocol was developed along several design points,
points, important among them: effective operation over wide- area important among them: effective operation over wide-area networks,
networks, including the Internet itself; strong security including the Internet itself; strong security integrated into the
integrated into the protocol; extensive cross-platform protocol; extensive cross-platform interoperability including
interoperability including integrated locking semantics compatible integrated locking semantics compatible with multiple operating
with multiple operating systems; and (this is key), protocol systems; and (this is key), protocol extension.
extension.
NFS version 4 is an excellent base on which to add the needed NFS version 4 is an excellent base on which to add the needed
performance enhancements and improved semantics described above. performance enhancements and improved semantics described above. The
The minor versioning support defined in NFS version 4 was designed minor versioning support defined in NFS version 4 was designed to
to support protocol improvements without disruption to the support protocol improvements without disruption to the installed
installed base. Evolutionary improvement of the protocol via minor base [NFSv4.1]. Evolutionary improvement of the protocol via minor
versioning is a conservative and cautious approach to current and versioning is a conservative and cautious approach to current and
future problems and shortcomings. future problems and shortcomings.
Many arguments can be made as to the efficacy of the file Many arguments can be made as to the efficacy of the file abstraction
abstraction in meeting the future needs of enterprise data service in meeting the future needs of enterprise data service and the
and the Internet. Fine grained Quality of Service (QoS) policies Internet. Fine grained Quality of Service (QoS) policies (e.g., data
(e.g., data delivery, retention, availability, security, ...) are delivery, retention, availability, security, etc.) are high among
high among them. them.
It is vital that the NFS protocol continue to provide these It is vital that the NFS protocol continue to provide these benefits
benefits to a wide range of applications, without its usefulness to a wide range of applications, without its usefulness being
being compromised by concerns about performance and semantic compromised by concerns about performance and semantic inadequacies.
inadequacies. This can reasonably be addressed in the existing NFS This can reasonably be addressed in the existing NFS protocol
protocol framework. A cautious evolutionary improvement of framework. A cautious evolutionary improvement of performance and
performance and semantics allows building on the value already semantics allows building on the value already present in the NFS
present in the NFS protocol, while addressing new requirements that protocol, while addressing new requirements that have arisen from the
have arisen from the application of networking technology. application of networking technology.
7. Security Considerations 7. Security Considerations
The NFS protocol, in conjunction with its layering on RPC, provides The NFS protocol, in conjunction with its layering on RPC, provides a
a rich and widely interoperable security model to applications and rich and widely interoperable security model to applications and
systems. Any layering of NFS over RDMA transports must address the systems. Any layering of NFS-over-RDMA transports must address the
NFS security requirements, and additionally must ensure that no new NFS security requirements, and additionally must ensure that no new
vulnerabilities are introduced. For RDMA, the integrity, and any vulnerabilities are introduced. For RDMA, the integrity, and any
privacy, of the data stream are of particular importance. privacy, of the data stream are of particular importance.
The core goals of an NFS-to-RDMA binding are to reduce overhead and The core goals of an NFS-to-RDMA binding are to reduce overhead and
to enable high performance. To support these goals while to enable high performance. To support these goals while maintaining
maintaining required NFS security protection presents a special required NFS security protection presents a special challenge.
challenge. Historically, the provision of integrity and privacy Historically, the provision of integrity and privacy have been
have been implemented within the RPC layer, and their operation implemented within the RPC layer, and their operation requires local
requires local processing of messages exchanged with the RPC peer. processing of messages exchanged with the RPC peer. This processing
This procesing imposes memory and processing overhead on a per- imposes memory and processing overhead on a per-message basis,
message basis, exactly the overhead that RDMA is designed to avoid. exactly the overhead that RDMA is designed to avoid.
Therefore, it is a requirement that the RDMA transport binding Therefore, it is a requirement that the RDMA transport binding
provide a means to delegate the integrity and privacy processing to provide a means to delegate the integrity and privacy processing to
the RDMA hardware, in order to maintain the high level of the RDMA hardware, in order to maintain the high level of performance
performance desired from the approach, while simultaneously desired from the approach, while simultaneously providing the
providing the existing highest levels of security required by the existing highest levels of security required by the NFS protocol.
NFS protocol. This in turn requires a means by which the RPC layer This in turn requires a means by which the RPC layer may invoke these
may invoke these services from the RDMA provider, and for the NFS services from the RDMA provider, and for the NFS layer to negotiate
layer to negotiate their use end-to-end. their use end-to-end.
The "Channel Binding" concept [RFC5056] provides a means by which The "Channel Binding" concept [RFC5056] together with "IPsec Channel
the RPC and NFS layers may delegate their session protection to the Connection Latching" [BTNSLATCH] provide a means by which the RPC and
lower RDMA layers. An extension to the RPCSEC_GSS protocol NFS layers may delegate their session protection to the lower RDMA
[RPCSECGSSV2] may then be specified to negotiate the use of these layers. An extension to the RPCSEC_GSS protocol [RFC5403] may be
bindings, and to establish the shared secrets necessary to protect employed to negotiate the use of these bindings, and to establish the
the sessions. shared secrets necessary to protect the sessions.
The protocol described in [RPCRDMA] specifies the use of these The protocol described in [RPCRDMA] specifies the use of these
mechanisms, and they are required to implement the protocol. mechanisms, and they are required to implement the protocol.
An additional consideration is protection of the integrity and An additional consideration is protection of the integrity and
privacy of local memory by the RDMA transport itself. The use of privacy of local memory by the RDMA transport itself. The use of
RDMA by NFS must not introduce any vulnerabilities to system memory RDMA by NFS must not introduce any vulnerabilities to system memory
contents, or to memory owned by user processes. These protections contents, or to memory owned by user processes. These protections
are provided by the RDMA layer specifications, and specifically are provided by the RDMA layer specifications, and specifically their
their security models. It is required that any RDMA provider used security models. It is required that any RDMA provider used for NFS
for NFS transport be conformant to the requirements of [RFC5042] in transport be conformant to the requirements of [RFC5042] in order to
order to satisfy these protections. satisfy these protections.
8. IANA Considerations
This document has no IANA considerations.
9. Acknowledgements 8. Acknowledgments
The authors wish to thank Jeff Chase who provided many useful The authors wish to thank Jeff Chase who provided many useful
suggestions. suggestions.
10. Normative References 9. References
[RFC3530] 9.1. Normative References
S. Shepler, et al., "NFS Version 4 Protocol", Standards Track
RFC
[RFC1831bis] [RFC3530] Shepler, S., Callaghan, B., Robinson, D., Thurlow, R.,
R. Thurlow, Ed., "RPC: Remote Procedure Call Protocol Beame, C., Eisler, M., and D. Noveck, "Network File
Specification Version 2", Standards Track RFC System (NFS) version 4 Protocol", RFC 3530, April 2003.
[RFC4506] [RFC5531] Thurlow, R., "RPC: Remote Procedure Call Protocol
M. Eisler, Ed. "XDR: External Data Representation Standard", Specification Version 2", RFC 5531, May 2009.
Standards Track RFC
[RFC1813] [RFC4506] Eisler, M., Ed., "XDR: External Data Representation
B. Callaghan, B. Pawlowski, P. Staubach, "NFS Version 3 Standard", STD 67, RFC 4506, May 2006.
Protocol Specification", Informational RFC
[RPCSECGSSV2] [RFC1813] Callaghan, B., Pawlowski, B., and P. Staubach, "NFS
M. Eisler, "RPCSEC_GSS Version 2", Internet Draft Work In Version 3 Protocol Specification", RFC 1813, June 1995.
Progress, draft-ietf-nfsv4-rpcsec-gss-v2
[RFC5056] [RFC5403] Eisler, M., "RPCSEC_GSS Version 2", RFC 5403, February
N. Williams, "On the Use of Channel Bindings to Secure 2009.
Channels", Standards Track RFC
[RFC5042] [RFC5056] Williams, N., "On the Use of Channel Bindings to Secure
J. Pinkerton, E. Deleganes, "Direct Data Placement Protocol Channels", RFC 5056, November 2007.
(DDP) / Remote Direct Memory Access Protocol (RDMAP) Security"
Standards Track RFC
11. Informative References [RFC5042] Pinkerton, J. and E. Deleganes, "Direct Data Placement
Protocol (DDP) / Remote Direct Memory Access Protocol
(RDMAP) Security", RFC 5042, October 2007.
[BRU99] 9.2. Informative References
J. Brustoloni, "Interoperation of copy avoidance in network
and file I/O", in Proc. INFOCOM '99, pages 534-542, New York,
NY, Mar. 1999., IEEE. Also available from
http://www.cs.pitt.edu/~jcb/publs.html
[CAL+03] [BRU99] J. Brustoloni, "Interoperation of copy avoidance in
B. Callaghan, T. Lingutla-Raj, A. Chiu, P. Staubach, O. Asad, network and file I/O", in Proc. INFOCOM '99, pages 534-
"NFS over RDMA", in Proceedings of ACM SIGCOMM Summer 2003 542, New York, NY, Mar. 1999., IEEE. Also available from
NICELI Workshop. http://www.cs.pitt.edu/~jcb/publs.html.
[CHA+01] [BTNSLATCH] Williams, N., "IPsec Channels: Connection Latching", Work
J. S. Chase, A. J. Gallatin, K. G. Yocum, "Endsystem in Progress, November 2008.
[CAL+03] B. Callaghan, T. Lingutla-Raj, A. Chiu, P. Staubach, O.
Asad, "NFS over RDMA", in Proceedings of ACM SIGCOMM
Summer 2003 NICELI Workshop.
[CHA+01] J. S. Chase, A. J. Gallatin, K. G. Yocum, "Endsystem
optimizations for high-speed TCP", IEEE Communications, optimizations for high-speed TCP", IEEE Communications,
39(4):68-74, April 2001. 39(4):68-74, April 2001.
[CHA+99] [CHA+99] J. S. Chase, D. C. Anderson, A. J. Gallatin, A. R.
J. S. Chase, D. C. Anderson, A. J. Gallatin, A. R. Lebeck, K. Lebeck, K. G. Yocum, "Network I/O with Trapeze", in 1999
G. Yocum, "Network I/O with Trapeze", in 1999 Hot Hot Interconnects Symposium, August 1999.
Interconnects Symposium, August 1999.
[CHU96] [CHU96] H.K. Chu, "Zero-copy TCP in Solaris", Proc. of the USENIX
H.K. Chu, "Zero-copy TCP in Solaris", Proc. of the USENIX 1996 1996 Annual Technical Conference, San Diego, CA, January
Annual Technical Conference, San Diego, CA, January 1996 1996.
[DCK+03] [DCK+03] M. DeBergalis, P. Corbett, S. Kleiman, A. Lent, D.
M. DeBergalis, P. Corbett, S. Kleiman, A. Lent, D. Noveck, T. Noveck, T. Talpey, M. Wittle, "The Direct Access File
Talpey, M. Wittle, "The Direct Access File System", in System", in Proceedings of 2nd USENIX Conference on File
Proceedings of 2nd USENIX Conference on File and Storage and Storage Technologies (FAST '03), San Francisco, CA,
Technologies (FAST '03), San Francisco, CA, March 31 - April March 31 - April 2, 2003.
2, 2003
[FJDAFS] [FJDAFS] Fujitsu Prime Software Technologies, "Meet the DAFS
Fujitsu Prime Software Technologies, "Meet the DAFS Performance with DAFS/VI Kernel Implementation using
Performance with DAFS/VI Kernel Implementation using cLAN", cLAN", available from
available from http://www.pst.fujitsu.com/english/dafsdemo/index.html,
http://www.pst.fujitsu.com/english/dafsdemo/index.html, 2001. 2001.
[FJNFS] [FJNFS] Fujitsu Prime Software Technologies, "An Adaptation of
Fujitsu Prime Software Technologies, "An Adaptation of VIA to VIA to NFS on Linux", available from
NFS on Linux", available from
http://www.pst.fujitsu.com/english/nfs/index.html, 2000. http://www.pst.fujitsu.com/english/nfs/index.html, 2000.
[GAL+99] [GAL+99] A. Gallatin, J. Chase, K. Yocum, "Trapeze/IP: TCP/IP at
A. Gallatin, J. Chase, K. Yocum, "Trapeze/IP: TCP/IP at Near- Near-Gigabit Speeds", 1999 USENIX Technical Conference
Gigabit Speeds", 1999 USENIX Technical Conference (Freenix (Freenix Track), June 1999.
Track), June 1999.
[KM02] [KM02] K. Magoutis, "Design and Implementation of a Direct
K. Magoutis, "Design and Implementation of a Direct Access Access File System (DAFS) Kernel Server for FreeBSD", in
File System (DAFS) Kernel Server for FreeBSD", in Proceedings Proceedings of USENIX BSDCon 2002 Conference, San
of USENIX BSDCon 2002 Conference, San Francisco, CA, February Francisco, CA, February 11-14, 2002.
11-14, 2002.
[MAF+02] [MAF+02] K. Magoutis, S. Addetia, A. Fedorova, M. Seltzer, J.
K. Magoutis, S. Addetia, A. Fedorova, M. Seltzer, J. Chase, D. Chase, D. Gallatin, R. Kisley, R. Wickremesinghe, E.
Gallatin, R. Kisley, R. Wickremesinghe, E. Gabber, "Structure Gabber, "Structure and Performance of the Direct Access
and Performance of the Direct Access File System (DAFS)", in File System (DAFS)", in Proceedings of 2002 USENIX Annual
Proceedings of 2002 USENIX Annual Technical Conference, Technical Conference, Monterey, CA, June 9-14, 2002.
Monterey, CA, June 9-14, 2002.
[MOG03] [MOG03] J. Mogul, "TCP offload is a dumb idea whose time has
J. Mogul, "TCP offload is a dumb idea whose time has come", come", 9th Workshop on Hot Topics in Operating Systems
9th Workshop on Hot Topics in Operating Systems (HotOS IX), (HotOS IX), Lihue, HI, May 2003. USENIX.
Lihue, HI, May 2003. USENIX.
[NFSv4.1] [NFSv4.1] Shepler, S., Eisler, M., and D. Noveck, "NFSv4 Minor
S. Shepler, ed., "NFSv4 Minor Version 1" Internet Draft work- Version 1", Work in Progress, September 2008.
in-progress, draft-ietf-nfsv4-minorversion1
[PAI+00] [PAI+00] V. S. Pai, P. Druschel, W. Zwaenepoel, "IO-Lite: a
V. S. Pai, P. Druschel, W. Zwaenepoel, "IO-Lite: a unified I/O unified I/O buffering and caching system", ACM Trans.
buffering and caching system", ACM Trans. Computer Systems, Computer Systems, 18(1):37-66, Feb. 2000.
18(1):37-66, Feb. 2000.
[RDDP] [RDDP] RDDP Working Group charter,
RDDP Working Group charter, http://www.ietf.org/html.charters/rddpcharter.html.
http://www.ietf.org/html.charters/rddp-charter.html
[RFC4297] [RFC4297] Romanow, A., Mogul, J., Talpey, T., and S. Bailey,
A. Romanow, J. Mogul, T. Talpey, S. Bailey, "Remote Direct "Remote Direct Memory Access (RDMA) over IP Problem
Memory Access (RDMA) over IP Problem Statement", Informational Statement", RFC 4297, December 2005.
RFC
[RFC1094] [RFC1094] Sun Microsystems, "NFS: Network File System Protocol
Sun Microsystems, "NFS: Network File System Protocol specification", RFC 1094, March 1989.
Specification"
[RPCRDMA] [RPCRDMA] Talpey, T. and B. Callaghan, "Remote Direct Memory Access
T. Talpey, B. Callaghan, "RDMA Transport for ONC RPC", Transport for Remote Procedure Call", Work in Progress,
Internet Draft Work in Progress, draft-ietf-nfsv4-rpcrdma April 2008.
[SHI+03] [SHI+03] P. Shivam, J. Chase, "On the Elusive Benefits of Protocol
P. Shivam, J. Chase, "On the Elusive Benefits of Protocol
Offload", Proceedings of ACM SIGCOMM Summer 2003 NICELI Offload", Proceedings of ACM SIGCOMM Summer 2003 NICELI
Workshop, also available from Workshop, also available from
http://issg.cs.duke.edu/publications/niceli03.pdf http://issg.cs.duke.edu/publications/niceli03.pdf.
[SKE+01] [SKE+01] K.-A. Skevik, T. Plagemann, V. Goebel, P. Halvorsen,
K.-A. Skevik, T. Plagemann, V. Goebel, P. Halvorsen,
"Evaluation of a Zero-Copy Protocol Implementation", in "Evaluation of a Zero-Copy Protocol Implementation", in
Proceedings of the 27th Euromicro Conference - Multimedia and Proceedings of the 27th Euromicro Conference - Multimedia
Telecommunications Track (MTT'2001), Warsaw, Poland, September and Telecommunications Track (MTT'2001), Warsaw, Poland,
2001. September 2001.
Authors' Addresses Authors' Addresses
Tom Talpey Tom Talpey
Network Appliance, Inc. 170 Whitman St.
1601 Trapelo Road, #16 Stow, MA 01775 USA
Waltham, MA 02451 USA
Phone: +1 781 768 5329 Phone: +1 978 821-8577
Email: thomas.talpey@netapp.com EMail: tmtalpey@gmail.com
Chet Juszczak Chet Juszczak
Chet's Boathouse Co.
P.O. Box 1467 P.O. Box 1467
Merrimack, NH 03054 Merrimack, NH 03054
Email: chetnh@earthlink.net Phone: +1 603 253-6602
EMail: chetnh@earthlink.net
Intellectual Property and Copyright Statements
Full Copyright Statement
Copyright (C) The IETF Trust (2008).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on
an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE
IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL
WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY
WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE
ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS
FOR A PARTICULAR PURPOSE.
Intellectual Property
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed
to pertain to the implementation or use of the technology described
in this document or the extent to which any license under such
rights might or might not be available; nor does it represent that
it has made any independent effort to identify any such rights.
Information on the procedures with respect to rights in RFC
documents can be found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use
of such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository
at http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at ietf-
ipr@ietf.org.
Acknowledgment
Funding for the RFC Editor function is provided by the IETF
Administrative Support Activity (IASA).
 End of changes. 111 change blocks. 
449 lines changed or deleted 409 lines changed or added

This html diff was produced by rfcdiff 1.35. The latest version is available from http://tools.ietf.org/tools/rfcdiff/