Internet Draft
Network Working Group                                Anna Charny, ed.
                                                           Cisco Systems Editor
Internet Draft                                                Fred Baker
Expiration Date: December 2001                               Bruce Davie
                                                     Cisco Systems Systems, Inc.
Jon Bennett Bennet
Riverdelta Networks

Kent Benson
                                                                 Tellabs                                          Jean-Yves Le Boudec
Tellabs                                                             EPFL

Angela Chiu
                                                               AT&T Labs                                             William Courtney
AT&T Labs                                                            TRW

                                                             Bruce Davie
                                                           Cisco Systems

Shahram Davari
                                                              PMC-Sierra                                             Victor Firoiu
PMC-Sierra                                               Nortel Networks

Charles Kalmanek                                       K.K. Ramakrishnam
AT&T Research

                                                       K.K. Ramakrishnan                                         TeraOptic Networks

Dimitrios Stiliadis
Lucent Technologies

   Expires August, 2001
   draft-ietf-diffserv-ef-supplemental-00.txt February 2001
     Supplemental Information for the New Definition of the EF PHB

               draft-ietf-diffserv-ef-supplemental-01.txt

Status of this Memo

   This document is an Internet Draft Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026. Internet Drafts

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its Areas, areas, and its Working Groups. working groups.  Note that
   other groups may also distribute working documents as Internet Internet-
   Drafts.

   Internet Drafts

   Internet-Drafts are draft documents valid for a maximum of six
   months. Internet Drafts months
   and may be updated, replaced, or obsoleted by
                       Information for the EF PHB   Expires: July 1 2001 other documents at any
   time.  It is not appropriate inappropriate to use Internet
   Drafts Internet-Drafts as reference
   material or to cite them other than as a
   "working draft" or "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   To learn the current status of any Internet-Draft, please check the
   "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
   Directories on ftp.ietf.org (US East Coast), nic.nordu.net
   (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific
   Rim).

   This document is a product of the Diffserv working group of the
   Internet Engineering Task Force. This document was written during the
   process of clarification of RFC2598 [10] that led to the publication
   of [6]. It is published as part of the historical record of the IETF's
   Differentiated Services working group.  Please address comments to the
   group's mailing list at diffserv@ietf.org, with a copy to the
   authors.

Copyright Notice

   Copyright (C) The Internet Society (1999). (2001).  All Rights Reserved.

Abstract

   This document is intended was written during the process of clarification of
   RFC2598 [10] that led to supplement the publication of revised specification of
   EF [6]. Its primary motivation is providing additional explanation to
   the revised EF definition and its properties. The document also
   provides additional implementation examples and gives some guidance
   for computation of the numerical parameters of the new definition for
   several well known schedulers and router architectures.

Contents

1.

    1      Introduction .............................................3
2.  ...........................................   3
    2      Definition of EF PHB .....................................3  ...................................   3
    2.1    The formal definition ....................................4  ..................................   3
    2.2    Relation to Packet Scale Rate Guarantee  ................   6
    2.3    The case need for dual characterization of an ideal output-buffered device with an EF FIFO at the output ....................................6
2.3 The General case .........................................7
2.3.1 The colorblind definition ..............................7
2.3.2 Packet reordering with the colorblind definition .......8
2.4 The packet-identity-aware definition .....................8 PHB  ...........   8
    3      Per Packet delay ...........................................9  .......................................  10
    3.1    Single hop delay bound ...................................9  .................................  10
    3.2    Multi-hop worst case delay ...............................9
                       Information for the EF PHB   Expires: July 1 2001  .............................  10
    4      Packet loss ...............................................10  ............................................  11
    5      Implementation considerations .............................11  ..........................  12
    5.1    The output buffered model with aggregate queuing EF FIFO at the output............................................12 output.  ..  13
    5.1.1  Strict Non-preemptive Priority Queue ..................12  ...................  13
    5.1.2  WF2Q ..................................................12  ...................................................  13
    5.1.3  Deficit Round Robin (DRR)..............................12 (DRR)  ..............................  13
    5.1.4  Start-Time Fair Queuing (SFQ) and Self-Clocked Fair Queuing ..........................................12  ..  14
    5.2 A    Router with Variable Internal Delay and Aggregate
      Scheduling EF FIFO at the output...............................12 output  ...  14
    6      Security Considerations ...................................13  ................................  14
    7 References.................................................13
8 Appendix. Difficulties with the RFC 2598 Definition .......14
8.1 Perfectly-Clocked Forwarding.............................15
8.2 Router Internal Delay ...................................15
8.3 Maximum Configurable Rate and Provisioning Efficiency....16
8.4 The Non-trivial Nature of the Difficulties ..............17
9 Authors'addresses .........................................18
10 Full Copyright ...........................................20

Specification of Requirements

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [3].

   1      References  .............................................  15

1. Introduction

   The Expedited Forwarding (EF) Per-Hop Behavior (PHB) was designed to
   be used to build a low-loss, low-latency, low-jitter, assured
   bandwidth service. The potential benefits of this service, and
   therefore the EF PHB, are enormous. Because of the great value of
   this PHB, it is critical that the forwarding behavior required of and
   delivered by an EF-compliant node be specific, quantifiable, and
   unambiguous.

   Unfortunately, the definition of EF PHB in the original RFC2598 [10]
   was not sufficiently precise (see Appendix and [4]). A more precise
   definition is given in [6]. This document is intended to aid in the
   understanding of the properties of the new definition and provide
   supplemental information not included in the text of [6] for sake of
   brevity.

   The document is outlined as follows. In section 2, we briefly restate
   the definition for EF PHB of [6]. We then provide some additional
   discussion of this definition and describe some of its properties.
   We discuss the issues associated with per-packet delay and loss in
   sections 3 and 4. In section 5 we discuss the impact of known
   scheduling architectures on the critical parameters of the new
   definition. We also discuss the impact of deviation of real devices
   from the ideal output-buffered model on the magnitude of the critical
   parameters in the definition.

2

2. Definition of EF PHB
                       Information for the EF PHB   Expires: July 1 2001

2.1

2.1. The formal definition

   An intuitive explanation of the new EF definition is described in
   [6]. Here we restate the formal definition from [6] verbatim.

   A node that supports EF on an interface I at some configured rate R
   MUST satisfy the following equations:

         d_j <= f_j + E_a for all j                                  (eq_1)

   where f_j is defined iteratively by

         f_0 = 0, d_0 = 0

         f_j = max(a_j, min(d_j-1, f_j-1)) + l_j/R,  for all j > 0  (eq_2)

   In this definition:

      - d_j is the time that the last bit of the j-th EF packet to
      depart actually leaves the node from the interface I.

      - f_j is the target departure time for the j-th EF packet to
      depart from I, the "ideal" time at or before which the last bit of
      that packet should leave the node.

      - a_j is the time that the last bit of the j-th EF packet destined
      to the output I to arrive actually arrives at the node.

      - l_j is the size (bits) of the j-th EF packet to depart from I.
      l_j is measured on the IP datagram (IP header plus payload) and
      does not include any lower layer (e.g. MAC layer) overhead.

      - R is the EF configured rate at output I (in bits/second).

      - E_a is the error term for the treatment of the EF aggregate.
      Note that E_a represents the worst case deviation between actual
      departure time of an EF packet and ideal departure time of the
      same packet, i.e. E_a provides an upper bound on (d_j - f_j) for
      all j.

      - d_0 and f_0 do not refer to a real packet departure but are used
      purely for the purposes of the recursion. The time origin should
      be chosen such that no EF packets are in the system at time 0.

      - for the definitions of a_j and d_j, the "last bit" of the packet
      includes the layer 2 trailer if present, because a packet cannot
      generally be considered available for forwarding until such a
      trailer has been received.

   An EF-compliant node MUST be able to be characterized by the range of
   possible R values that it can support on each of its interfaces while
   conforming to these equations, and the value of E_a that can be met
   on each interface. R may be line rate or less. E_a MAY be specified
   as a worst-case value for all possible R values or MAY be expressed
   as a function of R.

                       Information for the EF PHB   Expires: July 1 2001

   Note also that, since a node may have multiple inputs and complex
   internal scheduling, the jth EF packet to arrive at the node destined
   for a certain interface may not be the jth EF packet to depart. depart from
   that interface. It is in this sense that eq_1 and eq_2 are
   colorblind with regard to unaware of
   packet identity.

   In addition, a node that supports EF on an interface I at some
   configured rate R MUST satisfy the following equations:

      D_j <= F_j + E_p for all j                            (eq_3)

   where F_j is defined iteratively by

      F_0 = 0, D_0 = 0

      F_j = max(A_j, min(D_j-1, F_j-1)) + L_j/R,  for all j > 0  (eq_4)

   In this definition:

      - D_j is actual the departure time of the individual EF packet
      that arrived at the node destined for interface I at time A_j,
      i.e., given a packet which was the j-th EF packet destined for I
      to arrive at the node via any input, D_j is the time at which the
      last bit of that individual packet actually leaves the node from
      the interface I.

      - F_j is the target departure time for the individual EF packet
      which
      that arrived at the node destined for interface I at time A_j.

      - A_j is the time that the last bit of the j-th EF packet destined
      to the output I to arrive actually arrives at the node.

      - L_j is the size (bits) of the j-th EF packet to arrive at the
      node that is destined to output I. L_j is measured on the IP
      datagram (IP header plus payload) and does not include any lower
      layer (e.g. MAC layer) overhead.

      - R is the EF configured rate at output I (in bits/second).

      - E_p is the error term for the treatment of individual EF
      packets. Note that E_p represents the worst case deviation between
      actual departure time of an EF packet and ideal departure time of
      the same packet, i.e. E_p provides an upper bound on (D_j - F_j)
      for all j.

      - D_0 and F_0 do not refer to a real packet departure but are used
      purely for the purposes of the recursion. The time origin should
      be chosen such that no EF packets are in the system at time 0.

      - for the definitions of A_j and D_j, the "last bit" of the packet
      includes the layer 2 trailer if present, because a packet cannot
      generally be considered available for forwarding until such a
      trailer has been received.

   It is the fact that D_j and F_j refer to departure times for the jth
   packet to arrive that makes eq_3 and eq_4 aware of packet identity.
   This is the critical distinction between the last two equations and
   the first two.

                       Information for the EF PHB   Expires: July 1 2001

   An EF-compliant node SHOULD be able to be characterized by the range
   of possible R values that it can support on each of its interfaces
   while conforming to these equations, and the value of E_p that can be
   met on each interface. E_p MAY be specified as a worst-case value for
   all possible R values or MAY be expressed as a function of R. An E_p
   value of "undefined" MAY be specified.

2.2

   Finally, there is an additional requirement in [6] that an EF
   compliant node cannot reorder packets within a microflow.

   The definitions described in this section are referred to as
   aggregate and packet-identity-aware packet scale rate guarantee
   [4],[2]. An alternative mathematical characterisation of packet scale
   rate guarantee is given in Appendix B.

2.2. Relation to Packet Scale Rate Guarantee

   Consider the case of an ideal output-buffered device with an EF FIFO
   at the
   output output. For an ideal output-buffered device with such a FIFO for EF packets at the
   output and no internal delay, device, the i-th packet to arrive to the
   device is also the i-th packet to depart from the device. Therefore,
   in this ideal model the colorblind aggregate behavior and packet-identity-aware
   characteristics are identical, and E_a = E_p.  In this section we
   therefore omit the subscript and refer to the latency term simply as
   E.

   It could be shown that for such an ideal device the definition of
   section 2 is stronger than the well-known rate-latency curve [2] in
   the sense that if a scheduler satisfies the EF definition it also
   satisfies the rate-latency curve. As a result, all the properties
   known for the rate-latency curve also apply to the modified EF
   definition.  However, we argue below that the definition of section
   2.1 is more suitable to reflect the intent of EF PHB than the rate-latency rate-
   latency curve.

   It is shown in [2] that the rate-latency curve is equivalent to the
   following definition:

   Definition of Rate Latency Curve (RLC):

   D(j) <= F'(j) + E (eq_5)

   where
   F'(0)=0, F'(j)=max(a(j), F'(j-1))+ L(j)/R for all j>0 (eq_6)

   It can be easily verified that the EF definition of section 2.1 is
   stronger than RLC by noticing that for all j, F'(j) >= F(j).

   It is easy to see that F'(j) in the definition RLC corresponds to the
   time the j-th departure should have occurred should the EF aggregate
   be constantly served exactly at its configured rate R.  Following the
   common convention, we refer to F'(j) as the "fluid finish time" of
   the j-th packet to depart.

   The intuitive meaning of the rate-latency curve of RLC is that any
   packet is served at most time E later than this packet would finish
   service in the fluid model.

   For RLC (and hence for the stronger EF definition) it holds that in
   any interval (0,t) the EF aggregate gets close to the desired service
   rate R (as long as there is enough traffic to
                       Information for the EF PHB   Expires: July 1 2001 sustain this rate). The
   discrepancy between the ideal and the actual service in this interval
   depends on the latency term E, which in turn depends on the
   scheduling implementation. The smaller E, the smaller the difference
   between the configured rate and the actual rate achieved by the
   scheduler.

   While RLC guarantees the desired rate to the EF aggregate in all
   intervals (0,t) to within a specified error, it may nevertheless
   result in large gaps in service.  For example, suppose that (a large
   number) N of identical EF packets of length L arrived from different
   interfaces to the EF queue in the absence of any non-EF traffic.
   Then any work-conserving scheduler will serve all N packets at link
   speed. When the last packet is sent at time NL/C, where C is the
   capacity of output link, F(N) will be equal to NL/R. That is, the
   scheduler is running ahead of ideal, since NL/C < NL/R for R < C.
   Suppose now that at time NL/C a large number of non-EF packets
   arrive, followed by a single EF packet.  Then the scheduler can
   legitimately delay starting to send the EF packet until time
   F(N+1)=(N+1)L/R + E - L/C.  This means that the EF aggregate will
   have no service at all in the interval (NL/C, (N+1)L/R + E - L/C).
   This interval can be  quite large if R is substantially smaller than
   C.  In essence, the EF aggregate can be "punished" by a gap in
   service for receiving faster service than its configured rate at the
   beginning.

   The new EF definition alleviates this problem by introducing the term
   min(D(j-1), F(j-1)) in the recursion.  Essentially, this means that
   the fluid finishing time is "reset" if that packet is sent before its
   "ideal" departure time. As a consequence of that, for the case where
   the EF aggregate is served in the FIFO order, suppose a packet
   arrives at time t to a server satisfying the EF definition. The
   packet will be transmitted no later than time t + Q(t)/R + E, where
   Q(t) is the EF queue size at time t (including the packet under discussion).
   This statement is proved in  [4].

2.3
   discussion)[4].

2.3. The General case need for dual characterization of EF PHB

   In a more general case, where either the output scheduler does not
   serve the EF packets in a FIFO order, or the variable internal delay
   in the device reorders packets while delivering them to the output
   (or both), the i-th packet destined to a given output interface to
   arrive to the device may no longer be the i-th packet to depart from
   that interface.  In that case the packet-identity-aware and the colorblind
   aggregate definitions are no longer identical.

 2.3.1 The colorblind definition

   The colorblind aggregate behavior definition can be viewed as a truly aggregate
   characteristic of the service provided to EF packets. For an analogy
   consider a dark reservoir to which all arriving packets are placed.
   A scheduler is allowed to pick a packet from the reservoir in a
   random order, without any knowledge of the order of packet arrivals.
   The
   colorblind aggregate part of the definition measures the accuracy of the
   output
                       Information for the EF PHB   Expires: July 1 2001 rate provided to the EF aggregate as a whole. The smaller E_a,
   the more accurate is the assurance that the reservoir is drained at
   least at the configured rate.

2.3.2 Packet reordering with the colorblind definition

   Note that in this reservoir analogy packets of EF aggregate may be
   arbitrarily reordered. However, the definition of EF PHB given in [6]
   explicitly requires that no packet reordering occur within a
   microflow. This requirement restricts the scheduling implementations,
   or, in the reservoir analogy, the order of pulling packets out of the
   reservoir to make sure that packets within a microflow are not
   reordered, but it still allows reordering at the aggregate level.

   Note that reordering within the aggregate, as long as there is no
   flow-level reordering, does not necessarily reflect a "bad" service.
   Consider for example a scheduler that arbitrates among 10 different
   EF "flows" with diverse rates.  A scheduler that is aware of the rate
   requirements may choose to send a packet of the faster flow before a
   packet of the slower flow to maintain lower jitter at the flow level.
   In particular, an ideal "flow"-aware WFQ scheduler will cause
   reordering within the aggregate, while maintaining packet ordering
   and small jitter at the flow level.

   It is intuitively clear that for such a scheduler, as well as for a
   simpler FIFO scheduler, the "accuracy" of the service rate is crucial
   for minimizing "flow"-level jitter. The packet-identity-aware
   definition quantifies this accuracy of the service rate.  A small
   value of E_a is a necessary (although not sufficient) condition for
   maintaining small per-flow jitter.

   However, the small value of E_a does not give any assurances about
   the absolute value of per-packet delay. In fact, if the input rate
   exceeds the configured rate, the colorblind aggregate behavior definition may
   result in arbitrarily large delay of a subset of packets.  This is
   the primary motivation for the packet-identity-aware definition.

2.4 The packet-identity-aware definition

   The primary goal of the packet-aware characterization of the EF
   implementation is that, unlike the colorblind aggregate behavior
   characterization, it provides a way to find a per-packet delay bound
   as a function of input traffic parameters.

   While the colorblind aggregate behavior definition characterizes the accuracy of
   the service rate of the entire EF aggregate, the packet-identity-aware packet-identity-
   aware part of the definition characterizes the deviation of the
   device from an ideal server that serves the EF aggregate in FIFO
   order at least at the configured rate.

   The value of E_p in the packet-identity-aware definition is therefore
   affected by two factors: the accuracy of the aggregate rate service
   and the degree of packet reordering within the EF aggregate (under
   the
                       Information for the EF PHB   Expires: July 1 2001 constraint that packets within the same microflow are not
   reordered). Therefore, a sub-aggregate aware device that provides an
   ideal service rate to the aggregate, and also provides an ideal rate
   service for each of the sub-aggregates, may nevertheless have a very
   large value of E_p (in this case E_p must be at least equal to the
   ratio of the maximum packet size divided by the smallest rate of any
   sub aggregate).  As a result, a large value of E_p does not
   necessarily mean that the service provided to EF aggregate is bad -
   rather it may be an indication that the service is good, but non-FIFO. non-
   FIFO.  On the other hand, a large value of E_p may also mean that the
   aggregate service is very inaccurate (bursty), and hence in this case
   the large value of E_p reflects a poor quality of implementation.

   As a result, a large number of E_p does not necessarily provide any
   guidance on the quality of the EF implementation.  However, a small
   value of E_p does indicate a high quality FIFO implementation.

   Since E_p and E_a relate to different aspects of the EF
   implementation, they should be considered together to determine the
   quality of the implementation.

3. Per Packet delay

   The primary motivation for the packet-identity-aware definition is
   that it allows to quantify allows quantification of the per-packet delay bound.
   This section discusses the issues with computing per-packet delay

3.1

3.1. Single hop delay bound

   If the total traffic arriving to an output port I from all inputs is
   constrained by a leaky bucket with parameters (R, B), where R is the
   configured rate at I, and B is the bucket depth (burst), then the
   delay of any packet departing from I is bounded by D_p, given by

   D_p = B/R + E_p (eq_7)

   (see appendix B).

   Because the delay bound depends on the configured rate R and the
   input burstiness B, it is desirable for both of these parameters to
   be visible to a user of the device.  A PDB desiring a particular
   delay bound may need to limit the range of configured rates and
   allowed burstiness that it can support in order to deliver such
   bound. Equation (eq_7) provides a means for determining an acceptable
   operating region for the device with a given E_p. It may also be
   useful to limit the total offered load to a given output to some rate
   R_1 < R (e.g. to obtain end-to-end delay bounds [5]). It is important
   to realize that, while R_1 may also be a configurable parameter of
   the device, the delay bound in (eq_7) does not depend on it. It may
   be possible to get better bounds explicitly using the bound on the
   input rate, but the bound (eq_7) does not take advantage of this
   information.

3.2

3.2. Multi-hop worst case delay

   Although the PHB defines inherently local behavior, in this section
   we briefly discuss the issue of per-packet delay as the packet
   traverses
                       Information for the EF PHB   Expires: July 1 2001 several hops implementing EF PHB.  Given a delay bound
   (eq_7) at a single hop, it is tempting to conclude that per-packet
   bound across h hops is simply h times the bound (eq_7).  However,
   this is not necessarily the case, unless B represents the worst case
   input burstiness across all nodes in the network.

   However,

   Unfortunately, obtaining such a worst case value of B is not trivial.
   If EF PHB is implemented using aggregate class-based scheduling where
   all EF packets share a single FIFO, the so-called effect of jitter accumulation
   may result in an increase in burstiness from hop to hop. In
   particular, it can be shown that unless severe restrictions on EF
   utilization are imposed, even if all EF flows are ideally shaped at
   the ingress, then for any value of delay D it is possible to
   construct a network where EF utilization on any link is bounded not
   to exceed a given factor, no flow traverses more than a specified
   number of hops, but there exists a packet that experiences a delay
   more than D [5].  This result implies that the ability to limit the
   worst case burstiness and the resulting end-to-end delay across
   several hops may require not only limiting EF utilization on all
   links, but also constraining the global network topology. Such
   topology constraints would need to be specified in the definition of
   any PDB built on top of EF PHB, if such PDB requires a strict worst
   case delay bound.

4. Packet loss

   Any device with finite buffering may need to drop packets if the
   input burstiness becomes sufficiently high. To meet the low loss
   objective of EF, a node may be characterized by the operating region
   in which loss of EF due to  congestion will not occur. This may be
   specified as a token bucket of  rate r <= R and burst size B that can
   be offered from all inputs to a  given output interface without loss.

   However, as discussed in the previous section, the phenomenon of
   jitter accumulation makes it generally difficult to guarantee that
   the input burstiness never exceeds the specified operating region. region for
   nodes internal to the DiffServ domain. A no-loss guarantee across
   multiple hops may require specification of constraints on network
   topology which are outside the scope of inherently local definition
   of a PHB. Thus, it must be possible to establish whether a device
   conforms to the EF definition even when some packets are lost.

   This can be done by performing an "off-line" test of conformance to
   equations (eq_1)- (eq_4). After observing a sequence of packets
   entering and leaving the node, the packets which did not leave are
   assumed lost and are notionally removed from the input stream. The
   remaining packets now constitute the arrival stream and the packets
   which left the node constitute the departure stream. Conformance to
   the equations can thus be verified by  considering only those packets
   that successfully passed through the node..

   Note that in the event that loss does occur, the specification of which packets are lost in the case when
   loss does occur is beyond the scope of the definition of EF PHB.
   However, those packets that were not lost must conform to the
   equations definition of EF PHB in section 2.1.

                       Information for the EF PHB   Expires: July 1 2001

5. Implementation considerations

   A packet passing through a router will experience delay for a number
   of reasons.  Two familiar components of this delay are the time the
   packet sits spends in a buffer at an outgoing link waiting for the
   scheduler to select it and the time it takes to actually transmit the
   packet on the outgoing line.

   There may be other components of a packet's delay through a router,
   however.  A router might have to do some amount of header processing
   before the packet can be given to the correct output scheduler, for
   example.  In another case a router may have a FIFO buffer (called a
   transmission queue in [7]) where the packet sits after being selected
   by the output scheduler but before it is transmitted.  In cases such
   as these, the extra delay a packet may experience can be accounted
   for by absorbing it into the latency terms E_a and E_p.

   Implementing EF on a router with a multi-stage switch fabric requires
   special attention. A packet may experience additional delays due to
   the fact that it must compete with other traffic for forwarding
   resources at multiple contention points in the switch core. The delay
   an EF packet may experience before it even reaches the output-link
   scheduler should be included in the latency term. Input-buffered and
   input/output-buffered routers based on crossbar design may also
   require modification of their latency terms. The factors such as the
   speedup factor and the choice of crossbar arbitration algorithms may
   affect the latency terms substantially.

   Delay in the switch core comes from two sources, both of which must
   be considered.  The first part of this delay is the fixed delay a
   packet experiences regardless of the other traffic.  This component
   of the delay includes the time it takes for things such as packet
   segmentation and reassembly in cell based cores, enqueueing and
   dequeueing at each stage, and transmission between stages.  The
   second part of the switch core delay is variable and depends on the
   type and amount of other traffic traversing the core.  This delay
   comes about if the stages in the core mix traffic flowing between
   different input/output port pairs.  Thus, EF packets must compete
   against other traffic for forwarding resources in the core.  Some of
   this  competing traffic may even be EF traffic from other aggregates.
   This introduces extra delay, that can also be absorbed by the latency
   term in the definition.

   To capture these considerations, in this section we will consider two
   simplified implementation examples. The first is an ideal output
   buffered node where packets entering the device from an input
   interface are immediately delivered to the output scheduler. In this
   model the properties of the output scheduler fully define the values
   of the parameters E_a and E_p. We will consider the case where the
   output scheduler implements aggregate class-based queueing, so that
   all EF packets share a single queue. We will discuss the values of
   E_a and E_p for a variety of class-based schedulers widely considered
                       Information for the EF PHB   Expires: July 1 2001

   acceptable for EF implementations. variety of class-based schedulers widely considered

   The second example will consider a router modeled as a black box with
   a known bound on the variable delay a packet can experience from the
   time it arrives to an input to the time it is delivered to its
   destination output. The output scheduler in isolation is assumed to
   be an aggregate scheduler where all EF packets share a single FIFO
   queue, with a known value of E_a(S)=E_p(S)=E(S). This model provides
   a reasonable abstraction to a large class of router implementations.

5.1. The output buffered model with aggregate queuing EF FIFO at the output.

   As has been mentioned earlier, in this model E_a = E_p, so we shall
   omit the subscript and refer to both terms as latency E.  The
   remainder of this subsection discusses E for a number of scheduling
   implementations.

5.1.1

5.1.1. Strict Non-preemptive Priority Queue

   A Strict Priority scheduler in which all EF packets share a single
   FIFO queue which is served at strict non-preemptive priority over
   other queues satisfies the EF definition with the latency term E =
   MTU/C where MTU is the maximum packet size and C is the speed of the
   output link.

5.1.2

5.1.2. WF2Q

   Another scheduler that satisfies the EF definition with a small
   latency term is WF2Q described in [1]. A class-based WF2Q scheduler,
   in which  all EF traffic shares a single queue with the weight
   corresponding to the configured rate of the EF aggregate satisfies
   the EF definition with the latency term E = MTU/C+MTU/R.

5.1.3.Deficit

5.1.3. Deficit Round Robin (DRR)

   For DRR [12], both E can be shown to grow linearly with
   N*(r_max/r_min)*MTU, where r_min and r_max denote the smallest and
   the largest rate among the rate assignments of all queues in the
   scheduler, and N is the number of queues in the scheduler

5.1.4. Start-Time Fair Queuing (SFQ) and Self-Clocked Fair Queuing
   (SCFQ)

   For SFQ Start-Time Fair Queuing and  (SFQ) [9] and SCFQ Self-Clocked Fair
   Queueing (SCFQ)  [8] E can be shown to grow linearly with the number
   of queues in the scheduler.

5.2. A Router with Variable Internal Delay and Aggregate Scheduling EF FIFO at the
output. output

   In this section we consider a router which is modeled as follows. A
   packet entering the router may experience a variable delay D_v with a
   known upper bound D. That is, 0<=D_v<D. 0<=D_v<=D.  At the output all EF
   packets share a single class queue.  Class queues are scheduled by a
   scheduler with a known value E(S) E_p(S)=E(S) (where E(S) corresponds to
   the model where this scheduler is implemented in an ideal output
   buffered device).

                       Information for the EF PHB   Expires: July 1 2001

   In this case, it can be easily shown that E_a = E(S) + D.

   The computation of E_p is more complicated. complicated in this case.  For such
   device, E_p =
   E(S)+  min((D+B/R), D*C_inp/R), where (B, R) are the input token
   bucket parameters of the EF aggregate at the given input, and C_inp is
   the total capacity of all input links from which EF traffic destined
   to the given  output can arrive.

   Intuitively, this can be understood as follows. The term min((D+B/R),
   D*C_inp/R) is the time to it takes to drain (at rate R) the maximum
   number of EF packets that can arrive in the interval of length D to
   the output from all inputs. Consider an EF packet arriving at the
   input at time t. The ideal departure time d of this packet in the
   packet-identity-aware definition would be determined by the number of
   packets which arrived to the router before this packet.  However, due
   to variable delay, some packets shown that arrive later than this packet to
   the input of the device may end up arriving before this packet to the
   output. The number of bits in all such packets is bounded by min(DR+B,
   D*C_inp).  If the EF queue is served at the configured rate R, then
   the chosen packet may be delayed additionally by min((D+B/R),
   D*C_inp/R). E_p = E(S)+2D+2B/R (see Appendix C).

   Recall from the discussion of section 3 that bounding input
   burstiness B may not be easy in a general topology. In the absence of
   the knowledge of a bound on B one can bound E_p as E_p = E(S) +
   D*C_inp/R.
   D*C_inp/R (see appendix B).

   Note also that the bounds in this section are derived using only the
   bound on the variable portion of the interval delay and the error
   bound of the output scheduler.  If more details about the
   architecture of a device are available, it may be possible to compute
   better bounds.

6. Security Considerations

   This informational draft provides additional information to aid in
   understanding of the EF PHB described in [6].  It adds no new
   functions to it. As a result, it adds no security issues to those
   described in that specification.

7. References

   [1] J.C.R. Bennett and H. Zhang, ``WF2Q: Worst-case Fair Weighted
   Fair Queuing'', INFOCOM'96, Mar, 1996

   [2] J.-Y. Le Boudec, "Application of Network Calculus To
   Guaranteed Service Networks", IEEE Transactions on P. Thiran, "Network Calculus", Springer Verlag
   Lecture Notes in Computer Science volume 2050, June 2001 (available
   online at http://icawww.epfl.ch)

   [3] S. Bradner, "Key Words for Use in RFCs to Indicate Requirement
   Levels", BCP 14, RFC 2119, March 1997.

   [4] J. Bennett, K. Benson, A. Charny, W. Courney, J.Y. Le Boudec,
                       Information for the EF PHB   Expires: July 1 2001
   "Delay Jitter Bounds and Packet Scale Rate Guarantee for Expedited
   Forwarding", Proc. Infocom 2001, April 2001.

   [5] A. Charny, J.-Y. Le Boudec "Delay Bounds in a Network with
   Aggregate Scheduling". Proc. of QoFIS'2000, September 25-26, 2000,
   Berlin, Germany.

   [6] B. Davie, ed. et al, "An Expedited Forwarding PHB",
   draft-ietf-diffserv-rfc2598bis-00.txt, draft-ietf-
   diffserv-rfc2598bis-01.txt, work in progress, February April 2001.

   [7] T. Ferrari and P. F. Chimento, "A Measurement- Based Analysis of
   Expedited Forwarding PHB Mechanisms," Eighth International Workshop
   on Quality of Service, Pittsburgh, PA, June 2000,

   [8] S.J. Golestani. "A Self-clocked Fair Queuing Scheme for Broad-band Broad-
   band Applications". In Proceedings of IEEE INFOCOM'94, pages 636-646,
   Toronto, CA, April 1994.

   [9] P. Goyal, H.M. Vin, and H. Chen. "Start-time Fair Queuing: A
   Scheduling Algorithm for Integrated Services". In Proceedings of the
   ACM-SIGCOMM 96, pages 157-168, Palo Alto, CA, August 1996.

   [10] V. Jacobson, K. Nichols, K. Poduri, "An Expedited Forwarding
   [11] V. Jacobson, K. Nichols, K. Poduri, "The 'Virtual Wire' Behavior
   Aggregate," (draft-ietf-diffserv-ba-vw-00.txt), March 2000.

   [12] M. Shreedhar and G. Varghese. "Efficient Fair Queueing Using
   Deficit Round Robin". In Proceedings of SIGCOMM'95, pages 231-243,
   Boston, MA, September 1995.

   8. Appendix.

Appendix A. Difficulties with the RFC 2598 EF PHB Definition

   The definition of the EF PHB as given in [10] states:

   "The EF PHB is defined as a forwarding treatment for a particular
   diffserv aggregate where the departure rate of the aggregate's
   packets from any diffserv node must equal or exceed a configurable
   rate.  The EF traffic SHOULD receive this rate independent of the
   intensity of any other traffic attempting to transit the node.  It
   [the EF PHB departure rate] SHOULD average at least the configured
   rate when measured over any time interval equal to or longer than the
   time it takes to send an output link MTU sized packet at the configured rate."
                       Information for the EF PHB   Expires: July 1 2001
   configured rate."

   A literal interpretation of the definition would consider the
   behaviors given in the next two subsections as non-compliant. The
   definition also unnecessarily constrains the maximum configurable
   rate of an EF aggregate.

   8.1

A.1 Perfectly-Clocked Forwarding

   Consider the following stream forwarded from a router with EF-
   configured rate R=C/2, where C is the output line rate. In the
   illustration, E is an MTU-sized EF packet while x is a non-EF packet
   or unused capacity, also of size MTU.

        ...

      E x E x E x E x E x E x...
       |-----|

   The interval between the vertical bars is 3*MTU/C, which is greater
   than MTU/(C/2), and so is subject to the EF PHB definition. During
   this interval, 3*MTU/2 bits of the EF aggregate should be forwarded,
   but only MTU bits are forwarded.  Therefore, while this forwarding
   pattern should be considered compliant under any reasonable
   interpretation of the EF PHB, it actually does not formally comply
   with the definition of RFC 2598.

   Note that this forwarding pattern can occur in any work-conserving
   scheduler in an ideal output-buffered architecture where EF packets
   arrive in a perfectly clocked manner according to the above pattern
   and are forwarded according to exactly the same pattern in the
   absence of any non-EF traffic.

   Trivial as this example may be, it reveals the lack of mathematical
   precision in the formal definition. The fact that no work-conserving
   scheduler can formally comply with the definition is unfortunate, and
   appears to warrant some changes to the definition that would correct
   this problem.

   The underlying reason for the problem described here is quite simple
   - one can only expect that the EF aggregate is served at configured
   rate in some interval where there is enough backlog of EF packets to
   sustain that rate. In the example above the packets come in exactly
   at the rate at which they are served, and so there is no persistent
   backlog.  Certainly, if the input rate is even smaller than the
   configured rate of the EF aggregate, there will be no backlog as
   well, and a similar formal difficulty will occur.

   A seemingly simple solution to this difficulty might be to require
   that the EF aggregate is served at its configured rate only when the
   queue is backlogged.  However, as we show in the remainder of this
   section, this solution does not suffice.

   8.2

A.2 Router Internal Delay

   We now argue that the example considered in the previous section is
   not as trivial as it may seem at first glance.

                       Information for the EF PHB   Expires: July 1 2001

   Consider a router with EF configured rate R = C/2 as in the previous
   example, but with an internal delay of 3T (where T = MTU/C) between
   the time that a packet arrives at the router and the time that it is
   first eligible for forwarding at the output link. Such things as
   header processing, route look-up, and delay in switching through a
   multi-layer fabric could cause this delay. Now suppose that EF
   traffic arrives regularly at a rate of (2/3)R = C/3. The router will
   perform as shown below.

      EF Packet Number 1 2 3 4 5 6 ...

      Arrival (at router) 0 3T 6T 9T 12T 15T ...

      Arrival (at scheduler) 3T 6T 9T 12T 15T 18T ...

      Departure 4T 7T 10T 13T 16T 19T ...

   Again, the output does not satisfy the RFC 2598 definition of EF PHB.
   As in the previous example, the underlying reason for this problem is
   that the scheduler cannot forward EF traffic faster than it arrives.
   However, it can be easily seen that the existence of internal delay
   causes one packet to be inside the router at all times. An external
   observer will rightfully conclude that the number of EF packets that
   arrived to the router is always at least one greater than the number
   of EF packets that left the router, and therefore the EF aggregate is
   constantly backlogged. However, while the EF aggregate is
   continuously backlogged, the observed output rate is nevertheless
   strictly less that the configured rate.

   This example indicates that the simple addition of the condition that
   EF aggregate must receive its configured rate only when the EF
   aggregate is backlogged does not suffice in this case.

   Yet, the problem described here is of fundamental importance in
   practice.  Most routers have a certain amount of internal delay.  A
   vendor declaring EF compliance is not expected to simultaneously
   declare the details of the internals of the router.  Therefore, the
   existence of internal delay may cause a perfectly reasonable EF
   implementation to display seemingly non-conformant behavior, which is
   clearly undesirable.

   8.3

A.3 Maximum Configurable Rate and Provisioning Efficiency

   It is well understood that with any non-preemptive scheduler, the
   compliant configurable rate for an EF aggregate cannot exceed C/2
   [11]. This is because an MTU-sized EF packet may arrive to an empty
   queue at time t just as an MTU-sized non-EF packet begins service.
   The maximum number of EF bits that could be forwarded during the
   interval [t, t + 2*MTU/C] is MTU. But if configured rate R > C/2,
   then this interval would be of length greater than MTU/R, and more
   than MTU EF bits would have to be served during this interval for the
   router to be compliant. Thus, R must be no greater than C/2.

   It can be shown that for schedulers other than PQ, such as various
                       Information for the EF PHB   Expires: July 1 2001
   implementations of WFQ, the maximum compliant configured rate may be
   much smaller than 50%. For example, for SCFQ [8] the maximum
   configured rate cannot exceed C/N, where N is the number of queues in
   the scheduler. For WRR, mentioned as compliant in section 2.2 of RFC
   2598, this limitation is even more severe. This is because in these
   schedulers a packet arriving to an empty EF queue may be forced to
   wait until one packet from each other queue (in the case of SCFQ) or
   until several packets from each other queue (in the case of WRR) are
   served before it will finally be forwarded.

   While it is frequently assumed that the configured rate of EF traffic
   will be substantially smaller than the link bandwidth, the
   requirement that this rate should never exceed 50% of the link
   bandwidth appears unnecessarily limiting.  For example, in a fully
   connected mesh network, where any flow traverses a single link on its
   way from source to its destination there seems no compelling reason
   to limit the amount of EF traffic to 50% (or an even smaller
   percentage for some schedulers) of the link bandwidth.

   Another, perhaps even more striking example is the fact that even a
   TDM circuit with dedicated slots cannot be configured to forward EF
   packets at more than 50% of the link speed without violating RFC 2598
   (unless the entire link is configured for EF). If the configured rate
   of EF traffic is greater than 50% (but less than the link speed),
   there will always exist an interval longer than MTU/R in which less
   than the configured rate is achieved. For example, suppose the
   configured rate of the EF aggregate is 2C/3. Then the forwarding
   pattern of the TDM circuit might be

   E E x E E x E E x ...
      |---|

   where only one packet is served in the marked interval of length 2T =
   2MTU/C. But at least 4/3 MTU would have to be served during this
   interval by a router in compliance with the definition in RFC 2598.
   The fact that even a TDM line cannot be booked over 50% by EF traffic
   indicates that the restriction is artificial and unnecessary.

   8.4

A.4 The Non-trivial Nature of the Difficulties

   One possibility to correct the problems discussed in the previous
   sections might be to attempt to clarify the definition of the
   intervals to which the definition applied or by averaging over
   multiple intervals. However, an attempt to do so meets with
   considerable analytical and implementation difficulties. For example,
   attempting to align interval start times with some epochs of the
   forwarded stream appears to require a certain degree of global clock
   synchronization and is fraught with the risk of misinterpretation and
   mistake in practice.

   Another approach might be to allow averaging of the rates over some
   larger time scale. However, it is unclear exactly what finite time
   scale would suffice in all reasonable cases. Furthermore, this
                       Information for the EF PHB   Expires: July 1 2001
   approach would compromise the notion of very short-term time scale
   guarantees that are the essence of EF PHB.

   We also explored a combination of two simple fixes. The first is the
   addition of the condition that the only intervals subject to the
   definition are those that fall inside a period during which the EF
   aggregate is continuously backlogged in the router (i.e., when an EF
   packet is in the router). The second is the addition of an error
   (latency) term that could serve as a figure-of-merit in the
   advertising of EF services.

   With the addition of these two changes the candidate definition
   becomes as follows:

   In any interval of time (t1, t2) in which EF traffic is continuously
   backlogged, at least R(t2 - t1 - E) bits of EF traffic must be
   served, where R is the configured rate for the EF aggregate and E is
   an implementation-specific latency term.

   The "continuously backlogged" condition eliminates the insufficient-
   packets-to-forward difficulty while the addition of the latency term
   of size MTU/C resolves the perfectly-clocked forwarding example
   (section 1.2.1), and also removes the limitation on EF configured
   rate.

   However, neither fix (nor the two of them together) resolves the
   example of section 1.2.2. To see this, recall that in the example of
   section 1.2.2 the EF aggregate is continuously backlogged, but the
   service rate of the EF aggregate is consistently smaller than the
   configured rate, and therefore no finite latency term will suffice to
   bring the example into conformance.

9.

Appendix B. Alternative Characterization of Packet Scale Rate Guarantee

   The proofs of several bounds in this document can be found in
   http://ica1www.epfl.ch/PS_files/proofsEFJuli2001.pdf. These proofs
   use an algebraic characterization of the aggregate definition given
   by (eq_1), (eq-2), and packet identity aware definition given by (eq
   3), (eq 4).   Since this characterization is of interest on its own,
   we present it in this section.

   Theorem B1.  Characterization of the aggregate definition without
   f_n.

   Consider a system where packets are numbered 1, 2, ... in order of
   arrival. As in the aggregate definition, call a_n the n-th arrival
   time, d_n - the n-th departure time, and l_n the size of the n-th
   packet to depart. Define by convention d_0=0. The aggregate
   definition with rate R and latency E_a is equivalent to saying  that
   for all n and all 0<=j<= n-1:

   d_n <= E_a + d_j + (l_(j+1) + ... + l_n)/R      (eq_b1)

   or

   there exists  some j+1 <= k <= n  such that

   d_n  <= E_a + a_k + (l_k + ... + l_n)/R          (eq_b2)

   Theorem B2.  Characterization of packet-identity-aware definition
   without F_n.

   Consider a system where packets are numbered 1, 2, ... in order of
   arrival. As in the packet-identity-aware definition, call A_n, D_n
   the arrival and departure times for the n-th packet, and L_n the size
   of this packet. Define by convention D_0=0. The packet identity aware
   definition with rate R and latency E_p is equivalent to saying  that
   for all n and all 0<=j<= n-1:

   D_n <= E_p + D_j + (L_{j+1} + ... + L_n)/R      (eq_b3)

   or

   there exists  some j+1 <= k <= n  such that

   D_n  <= E_p + A_k + (L_k + ... + L_n)/R           (eq_b4)

   For the proofs of both Theorems, see
   http://ica1www.epfl.ch/PS_files/proofsEFJuli2001.pdf

Authors' addresses

   Anna Charny, ed. Charny
   Cisco Systems
   300 Apollo Drive
   Chelmsford, MA 01824
   acharny@cisco.edu

   E-mail: acharny@cisco.com

   Fred Baker
   Cisco Systems
   170 West Tasman Dr.
   San Jose, CA 95134

   E-mail: fred@cisco.com

   Jon Bennett
   RiverDelta Networks
   3 Highwood Drive East
   Tewksbury, MA 01876

   E-mail: jcrb@riverdelta.com
   Kent Benson
                       Information for the EF PHB   Expires: July 1 2001
   Tellabs Research Center
   3740 Edison Lake Parkway #101
   Mishawaka, IN  46545

   E-mail: Kent.Benson@tellabs.com

   Jean-Yves Le Boudec
   ICA-EPFL, INN
   Ecublens, CH-1015
   Lausanne-EPFL, Switzerland
   leboudec@epfl.c

   E-mail: leboudec@epfl.ch

   Angela Chiu
   AT&T Labs
   100 Schulz Dr. Rm 4-204
   Red Bank, NJ 07701

   E-mail: alchiu@att.com

   Bill Courtney
   TRW
   Bldg. 201/3702
   One Space Park
   Redondo Beach, CA 90278

   E-mail: bill.courtney@trw.com

   Shahram Davari
   PMC-Sierra Inc
   555
   411 Legget drive
   Suit 834, Tower B Drive
   Ottawa, ON K2K 2X3, 3C9, Canada

   E-mail: shahram_davari@pmc-sierra.com
   Bruce Davie
   Cisco Systems Systems, Inc.
   300 Apollo Drive
   Chelmsford, MA MA, 01824

   E-mail: bsd@cisco.com

   Victor Firoiu
   Nortel Networks
   600 Tech Park
   Billerica, MA 01821

   E-mail: vfirou@nortelnetworks.com

   Charles Kalmanek
   AT&T Labs-Research
   180 Park Avenue, Room A113,
   Florham Park NJ

   E-mail: crk@research.att.com.

   K.K. Ramakrishnan
   AT&T Labs-Research
   Rm. A155, 180 Park Ave,
   Florham Park, NJ 07932
                       Information for the EF PHB   Expires: July 1 2001

   kkrama@research.att.com
   TeraOptic Networks, Inc.
   686 W. Maude Ave
   Sunnyvale, CA 94086

   E-mail: kk@teraoptic.com

   Dimitrios Stiliadis
   Lucent Technologies
   1380 Rodick Road
   Markham, Ontario, L3R-4G5, Canada

   E-mail: stiliadi@bell-labs.com

   10.

Full Copyright

   Copyright (C) The Internet Society 1999. 2001.  All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works.  However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose ofdeveloping
   Internet standards in which case the procedures for copyrights
   defined in the Internet Standards process must be followed, or as
   required to translate it into languages other than English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.