draft-ietf-rddp-rdma-concerns-00.txt   draft-ietf-rddp-rdma-concerns-01.txt 
Internet Draft David L. Black Internet Draft David L. Black
Document: draft-ietf-rddp-rdma-concerns-00.txt EMC
Expires: May 2003 Michael F. Speer Document: draft-ietf-rddp-rdma-concerns-01.txt EMC
Expires: December 2004 Michael F. Speer
Sun Sun
John Wroclawski John Wroclawski
MIT MIT
November 2002
June 2004
DDP and RDMA Concerns DDP and RDMA Concerns
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with By submitting this Internet-Draft, I certify that any applicable
all provisions of Section 10 of RFC2026. patent or other IPR claims of which I am aware have been disclosed,
and any of which I become aware will be disclosed, in accordance
with RFC 3668.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet- other groups may also distribute working documents as Internet-
Drafts. Drafts.
Internet-Drafts are draft documents valid for a maximum of six Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-Drafts documents at any time. It is inappropriate to use Internet-Drafts
as reference material or to cite them other than as "work in as reference material or to cite them other than as "work in
skipping to change at page 1, line 38 skipping to change at page 1, line 47
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
Abstract Abstract
This draft describes technical concerns that should be considered This draft describes technical concerns that should be considered
in the design of standardized RDMA and DDP protocols/mechanisms for in the design of standardized RDMA and DDP protocols/mechanisms for
use with Internet transport protocols. This draft was written to use with Internet transport protocols. This draft was written to
provide input to the proposed new Remote Direct Data Placement provide input to the proposed new Remote Direct Data Placement
(rddp) WG, and is not intended for publication as an RFC. (rddp) WG, and is not intended for publication as an RFC.
This is an updated version of draft-black-rdma-concerns-00.txt to This is an updated and resubmitted version of draft-ietf-rddp-rdma-
change its name and expand Section 4.1 to incorporate an concerns-00.txt to make it available for current discussions of
application integrity issue raised on the rddp mailing list. mandatory-to-implement security in the RDDP WG. Sections 4.1, 4.2,
and 5 are of particular relevance to that discussion.
DDP and RDMA Concerns June
2004
Table of Contents Table of Contents
1. Overview......................................................2 1. Overview......................................................2
2. Conventions used in this document.............................3 2. Conventions used in this document.............................3
3. Architectural Concerns........................................3 3. Architectural Concerns........................................3
3.1 Buffer Management.........................................3 3.1 Buffer Management.........................................3
3.2 Reliability...............................................4 3.2 Reliability...............................................4
4. Memory is more general that Transport Buffers.................4 4. Memory is more general that Transport Buffers.................4
4.1 Overwrites................................................4 4.1 Overwrites................................................4
4.2 Concurrent Operations to the Same Memory..................5 4.2 Concurrent Operations to the Same Memory..................5
4.3 Completions and Ordering..................................5 4.3 Completions and Ordering..................................5
4.4 Transfer Granularity......................................5 4.4 Transfer Granularity......................................5
5. Security Considerations.......................................6 5. Security Considerations.......................................6
References.......................................................6 References.......................................................6
Acknowledgements.................................................7
Author's Addresses...............................................7 Author's Addresses...............................................7
1. Overview 1. Overview
A new effort to standardize RDMA (Remote Direct Memory Access) and A new effort to standardize RDMA (Remote Direct Memory Access) and
DDP (Direct Data Placement) protocols/mechanisms for Internet DDP (Direct Data Placement) protocols/mechanisms for Internet
transport protocols is going to take place in the proposed IETF transport protocols is going to take place in the proposed IETF
Remote Direct Data Placement (rddp) WG. This draft describes Remote Direct Data Placement (rddp) WG. This draft describes
technical concerns that should be addressed in the design and technical concerns that should be addressed in the design and
standardization of these protocols. A basic understanding of RDMA standardization of these protocols. A basic understanding of RDMA
and DDP is assumed; while a basic introduction is included in this and DDP is assumed; while a basic introduction is included in this
section; readers unfamiliar with these concepts may wish to refer section; readers unfamiliar with these concepts may wish to refer
to [Bailey-arch, Romanow-ps] for more background. to [RDDP-arch, RDDP-ps] for more background.
Both Direct Data Placement (DDP) and Remote Direct Memory Access Both Direct Data Placement (DDP) and Remote Direct Memory Access
(RDMA) have the goal of eliminating copies between the protocol (RDMA) have the goal of eliminating copies between the protocol
stack and application buffers at the receiver. For example, when a stack and application buffers at the receiver. For example, when a
4-kilobyte file or disk block is retrieved, most operating systems 4-kilobyte file or disk block is retrieved, most operating systems
expect the resulting block to be in 4kB of contiguous memory expect the resulting block to be in 4kB of contiguous memory
aligned to a 4kB boundary, but most networking interfaces do not aligned to a 4kB boundary, but most networking interfaces do not
behave in this fashion. The result is that a copy is required to behave in this fashion. The result is that a copy is required to
produce an aligned 4kB block of data from the data delivered by the produce an aligned 4kB block of data from the data delivered by the
network interface. This copy has undesirable performance impacts; network interface. This copy has undesirable performance impacts;
the goal of DDP and RDMA is to enable elimination of this copy in the goal of DDP and RDMA is to enable elimination of this copy in
an application- and protocol-independent fashion. The basic an application- and protocol-independent fashion. The basic
concept is that the sender identifies data to be placed directly concept is that the sender identifies data to be placed directly
into application buffers, and transmits that identification with into application buffers, and transmits that identification with
the data so that the receiver can place the data directly into the data so that the receiver can place the data directly into
application buffers when it is received. application buffers when it is received.
DDP is envisioned to share network transport buffers with DDP is envisioned to share network transport buffers with
applications, but to use application-specified tags and offsets to applications, but to use application-specified tags and offsets to
skipping to change at page 2, line 52 skipping to change at page 3, line 4
the goal of DDP and RDMA is to enable elimination of this copy in the goal of DDP and RDMA is to enable elimination of this copy in
an application- and protocol-independent fashion. The basic an application- and protocol-independent fashion. The basic
concept is that the sender identifies data to be placed directly concept is that the sender identifies data to be placed directly
into application buffers, and transmits that identification with into application buffers, and transmits that identification with
the data so that the receiver can place the data directly into the data so that the receiver can place the data directly into
application buffers when it is received. application buffers when it is received.
DDP is envisioned to share network transport buffers with DDP is envisioned to share network transport buffers with
applications, but to use application-specified tags and offsets to applications, but to use application-specified tags and offsets to
select buffers for use on receive. The primary purposes of this select buffers for use on receive. The primary purposes of this
DDP and RDMA Concerns June
2004
information are to separate application data from headers and deal information are to separate application data from headers and deal
with applications that return data in unpredictable orders (e.g., with applications that return data in unpredictable orders (e.g.,
the results of concurrent file and disk operations may be returned the results of concurrent file and disk operations may be returned
to the invoker in arbitrary order). One way to view DDP on the to the invoker in arbitrary order). One way to view DDP on the
wire is that it annotates (or "decorates") data that would have wire is that it annotates (or "decorates") data that would have
been sent anyway. been sent anyway.
RDMA uses DDP or a DDP-like mechanism to implement remote read and RDMA uses DDP or a DDP-like mechanism to implement remote read and
write operations on memory regions explicitly exported by end write operations on memory regions explicitly exported by end
systems. A tag is used to designate a memory region, and an offset systems. A tag is used to designate a memory region, and an offset
skipping to change at page 3, line 45 skipping to change at page 3, line 54
3.1 Buffer Management 3.1 Buffer Management
Traditional network stacks utilize a pool of interchangeable (aka Traditional network stacks utilize a pool of interchangeable (aka
anonymous) buffers to hold data received from the network. By anonymous) buffers to hold data received from the network. By
using specific identifiable application buffers, DDP and RDMA make using specific identifiable application buffers, DDP and RDMA make
the memory used for specific receive operations identifiable and the memory used for specific receive operations identifiable and
may cause protocols to devote more resources to the receive may cause protocols to devote more resources to the receive
function than might otherwise be the case. In situations where function than might otherwise be the case. In situations where
effective use is being made of DDP and/or RDMA, the actual resource effective use is being made of DDP and/or RDMA, the actual resource
demand on the system may be lessened (e.g., because applications demand on the system may be lessened (e.g., because applications
only expose memory that is in their working set), but it is only expose memory that is in their working set), but it is
necessary to anticipate applications that use DDP and RDMA in a way necessary to anticipate applications that use DDP and RDMA in a way
that increases resource demands and take appropriate precautions to that increases resource demands and take appropriate precautions to
limit system degradation. limit system degradation.
DDP and RDMA Concerns June
2004
3.2 Reliability 3.2 Reliability
RDMA is motivated by experiences with both local DMA and transfers RDMA is motivated by experiences with both local DMA and transfers
over reliable channels; these experiences will not be completely over reliable channels; these experiences will not be completely
applicable to RDMA over IP networks. Local DMA provides an extreme applicable to RDMA over IP networks. Local DMA provides an extreme
example, in that a local DMA failure is usually caused by hardware example, in that a local DMA failure is usually caused by hardware
problems that often result in the hardware being considered to have problems that often result in the hardware being considered to have
failed. In contrast, RDMA over IP must deal with a variety of failed. In contrast, RDMA over IP must deal with a variety of
"stupid IP network tricks" as part of its normal operation. "stupid IP network tricks" as part of its normal operation.
Channel behavior is a less extreme example as channel controllers Channel behavior is a less extreme example as channel controllers
must expect occasional channel failures and be prepared to deal must expect occasional channel failures and be prepared to deal
with the result; one example can be found in multipathing software with the result; one example can be found in multipathing software
for disk storage access. for disk storage access.
This set of concerns is roughly analogous to the reliability This set of concerns is roughly analogous to the reliability
difference between local and remote procedure calls and its impact difference between local and remote procedure calls and its impact
on distributed system design [need to add a reference here]. The on distributed system design [need to add a reference here]. The
skipping to change at page 5, line 5 skipping to change at page 5, line 5
specifications MUST contain mechanisms to prevent overwrites from specifications MUST contain mechanisms to prevent overwrites from
impairing system integrity and to isolate the effect of overwrites impairing system integrity and to isolate the effect of overwrites
so that interference among otherwise unrelated applications is so that interference among otherwise unrelated applications is
prevented. In addition the specifications MUST contain mechanisms prevented. In addition the specifications MUST contain mechanisms
that allow applications to control the exposure of memory used for that allow applications to control the exposure of memory used for
DDP and RDMA receives to subsequent overwrites; this is to enable DDP and RDMA receives to subsequent overwrites; this is to enable
an application to know that a check on received data (e.g., for an application to know that a check on received data (e.g., for
integrity) is performed after changes to it can no longer be made integrity) is performed after changes to it can no longer be made
by remote nodes via DDP or RDMA. by remote nodes via DDP or RDMA.
DDP and RDMA Concerns June
2004
4.2 Concurrent Operations to the Same Memory 4.2 Concurrent Operations to the Same Memory
If a remote (or local) write takes place concurrently with a read If a remote (or local) write takes place concurrently with a read
to the same memory, the read may return an arbitrary mix of the old to the same memory, the read may return an arbitrary mix of the old
and new contents of the memory. If a remote (or local) write takes and new contents of the memory. If a remote (or local) write takes
place concurrently with another write, the resulting memory place concurrently with another write, the resulting memory
contents may be an arbitrary mix of the data from the two writes. contents may be an arbitrary mix of the data from the two writes.
These results are generally considered undesirable, and should be These results are generally considered undesirable, and should be
avoided. DDP and RDMA specifications must consider how these avoided. DDP and RDMA specifications must consider how these
situations are to be avoided (e.g., application-level situations are to be avoided (e.g., application-level
synchronization may be required), so that at worst they will occur synchronization may be required), so that at worst they will occur
only as the result of application errors in using DDP and RDMA. only as the result of application errors in using DDP and RDMA.
4.3 Completions and Ordering 4.3 Completions and Ordering
skipping to change at page 5, line 45 skipping to change at page 5, line 52
4.4 Transfer Granularity 4.4 Transfer Granularity
IP transports include the functionality to bundle data so that a IP transports include the functionality to bundle data so that a
set of small user transfers is accomplished via a single larger set of small user transfers is accomplished via a single larger
transfer across the network and through the relevant portions of transfer across the network and through the relevant portions of
the protocol stacks. By defining specific remote operations that the protocol stacks. By defining specific remote operations that
an application may reasonably expect to complete in a timely an application may reasonably expect to complete in a timely
fashion, RDMA may disrupt this behavior by requiring smaller fashion, RDMA may disrupt this behavior by requiring smaller
transfers to be done promptly. The potential inefficiencies of the transfers to be done promptly. The potential inefficiencies of the
resulting behavior for protocol stacks and networks have been known resulting behavior for protocol stacks and networks have been known
for a long time; see the discussion of the small-packet problem in for a long time; see the discussion of the small-packet problem in
[RFC 896]. Any RDMA specification MUST consider the ability to [RFC 896]. Any RDMA specification MUST consider the ability to
bundle operations and the potential performance impact of bundle operations and the potential performance impact of
performing multiple smaller transfers in place of a single larger performing multiple smaller transfers in place of a single larger
one. This may also apply to DDP, but the first priority is that one. This may also apply to DDP, but the first priority is that
DDP SHOULD NOT cause major changes to the transmission behavior of DDP SHOULD NOT cause major changes to the transmission behavior of
any transport protocol to which it is applied by comparison to the any transport protocol to which it is applied by comparison to the
same stream without the DDP annotations (some degree of minor same stream without the DDP annotations (some degree of minor
DDP and RDMA Concerns June
2004
change is unavoidable due to the space consumed by the DDP change is unavoidable due to the space consumed by the DDP
annotations). annotations).
5. Security Considerations 5. Security Considerations
With the possible exception of the Completion and Ordering concerns With the possible exception of the Completion and Ordering concerns
described in Section 4.3, all of these concerns have security described in Section 4.3, all of these concerns have security
implications in that failing to deal with them adequately may implications in that failing to deal with them adequately may
expose attacks on system resources, correct operation and/or expose attacks on system resources, correct operation and/or
integrity. integrity.
When memory is accessible via the network, such access must be When memory is accessible via the network, such access must be
controlled, as allowing arbitrary access by untrusted entities controlled, as allowing arbitrary access by untrusted entities
discloses the contents of the memory (read access) and/or allows it discloses the contents of the memory (read access) and/or allows it
to be corrupted (write access). Specifically, it is necessary to to be corrupted (write access). Specifically, it is necessary to
provide mechanisms that enable applications to control RDMA and DDP provide mechanisms that enable applications to control RDMA and DDP
access to their exported memory by both identity (RDMA and DDP) and access to their exported memory by both identity (RDMA and DDP) and
type of access (read vs. write - RDMA only); this inherently type of access (read vs. write - RDMA only); this inherently
involves authentication of the principals granted access in order involves authentication of the principals granted access in order
to distinguish authorized from unauthorized access. Such to distinguish authorized from unauthorized access. Such
authentication MAY be implemented outside the DDP and/or RDMA authentication MAY be implemented outside the DDP and/or RDMA
protocols (e.g., in the application or a separate security protocol protocols (e.g., in the application or a separate security protocol
such as TLS or IPsec [citations]) provided that means are specified
to securely couple the authorization of DDP and RDMA operations to such as TLS [RFC 2246] or IPsec [RFC 2401]) provided that means are
the corresponding authentications.
specified to securely couple the authorization of DDP and RDMA
operations to the corresponding authentications.
References References
[Bailey-arch] Bailey, S., "The Architecture of Direct Data [RDDP-arch] Bailey, S. and T. Talpey, "The Architecture of Direct
Placement (DDP) And Remote Direct Memory Access (RDMA) On Data Placement (DDP) And Remote Direct Memory Access (RDMA) On
Internet Protocols", Internet-Draft draft-bailey-roi-ddp-rdma- Internet Protocols", Internet-Draft draft-ietf-rddp-arch-04.txt,
arch-01.txt, Work in Progress, November 2002.
[Romanow-ps] Romanow, A., J. Mogul, T. Talpey, and S. Bailey, "RDMA Work in Progress, January 2004.
over IP Problem Statement", Internet-Draft draft-romanow-rdma- [RDDP-ps] Romanow, A., J. Mogul, T. Talpey, and S. Bailey, "RDMA
over-ip-problem-statement-01.txt, Work in Progress, November over IP Problem Statement", Internet-Draft draft-ietf-rddp-
2002. problem-statement-03.txt, Work in Progress, January 2004.
[RFC 896] Nagle, J., "Congestion Control in IP/TCP Internetworks", [RFC 896] Nagle, J., "Congestion Control in IP/TCP Internetworks",
RFC 896, January 1984. RFC 896, January 1984.
[RFC 2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC 2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", RFC 2119, BCP 14, March 1997. Requirement Levels", RFC 2119, BCP 14, March 1997.
[RFC 2246] Dierks, T. and C. Allen, " The TLS Protocol Version
1.0", RFC 2246, January 1999.
[RFC 2401] Kent, S. and R. Atkinson, "Security Architecture for the
Internet Protocol", RFC 2401, November 1998.
DDP and RDMA Concerns June
2004
Acknowledgements Acknowledgements
This draft is based in part on a presentation and discussion at an This draft is based in part on a presentation and discussion at an
end2end research group meeting at MIT in May 2002 - the authors end2end research group meeting at MIT in May 2002 - the authors
thank the end2end RG for providing the opportunity and gratefully thank the end2end RG for providing the opportunity and gratefully
acknowledge the comments and suggestions of participants. acknowledge the comments and suggestions of participants.
Author's Addresses Author's Addresses
David L. Black David L. Black
EMC Corporation EMC Corporation
42 South Street Phone: +1 (508) 249-6449 176 South Street Phone: +1 (508) 293-7953
Hopkinton, MA, 01748, USA Email: black_david@emc.com Hopkinton, MA, 01748, USA Email: black_david@emc.com
Michael F. Speer Michael F. Speer
Sun Microsystems, Inc. Sun Microsystems, Inc.
4150 Network Circle UMPK17-103 Phone: +1 (650) 786-6445 4150 Network Circle UMPK17-103 Phone: +1 (650) 786-6445
Santa Clara, CA 95054 Email: michael.speer@sun.com Santa Clara, CA 95054 Email: michael.speer@sun.com
John Wroclawski John Wroclawski
MIT Lab for Computer Science MIT Lab for Computer Science
200 Technology Square Phone: +1 (617) 253-7885 200 Technology Square Phone: +1 (617) 253-7885
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/