BESS Working Group                                              J. Drake
Internet-Draft                                                 A. Farrel
Intended status: Standards Track                                E. Rosen
Expires: April 30, November 3, 2018                               Juniper Networks
                                                                K. Patel
                                                            Arrcus, Inc.
                                                                L. Jalil
                                                        October 27, 2017
                                                             May 2, 2018

   Gateway Auto-Discovery and Route Advertisement for Segment Routing
                     Enabled Domain Interconnection


   Data centers have become critical components of the infrastructure
   used by network operators to provide services to their customers.
   Data centers are attached to the Internet or a backbone network by
   gateway routers.  One data center typically has more than one gateway
   for commercial, load balancing, and resiliency reasons.

   Segment routing is a popular protocol mechanism for operating within
   a data center, but also for steering traffic that flows between two
   data center sites.  In order that one data center site may load
   balance the traffic it sends to another data center site it needs to
   know the complete set of gateway routers at the remote data center,
   the points of connection from those gateways to the backbone network,
   and the connectivity across the backbone network.

   Segment routing may also be operated in other domains, such as access
   networks.  Those domains also need to be connected across backbone
   networks through gateways.

   This document defines a mechanism using the BGP Tunnel Encapsulation
   attribute to allow each gateway router to advertise the routes to the
   prefixes in the segment routing domains to which it provides access,
   and also to advertise on behalf of each other gateway to the same
   segment routing domain.

Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   document are to be interpreted as described in [RFC2119].

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on April 30, November 3, 2018.

Copyright Notice

   Copyright (c) 2017 2018 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   ( in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
   2.  SR Domain Gateway Auto-Discovery  . . . . . . . . . . . . . .   5
   3.  Relationship to BGP Link State and Egress Peer Engineering  .   6
   4.  Advertising an SR Domain Route Externally . . . . . . . . . .   7
   5.  Encapsulation . . . . . . . . . . . . . . . . . . . . . . . .   7
   6.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   7
   7.  Security Considerations . . . . . . . . . . . . . . . . . . .   7
   8.  Manageability Considerations  . . . . . . . . . . . . . . . .   9
   9.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .   9
   10. References  . . . . . . . . . . . . . . . . . . . . . . . . .   9
     10.1.  Normative References . . . . . . . . . . . . . . . . . .   9
     10.2.  Informative References . . . . . . . . . . . . . . . . .  10
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  11

1.  Introduction

   Data centers (DCs) have become critical components of the
   infrastructure used by network operators to provide services to their
   customers.  DCs are attached to the Internet or a backbone network by
   gateway routers (GWs).  One DC typically has more than one GW for
   various reasons including commercial preferences, load balancing, and
   resiliency against connection of device failure.

   Segment routing (SR) [I-D.ietf-spring-segment-routing] is a popular
   protocol mechanism for operating within a DC, but also for steering
   traffic that flows between two DC sites.  In order for an ingress DC
   that uses SR to load balance the flows it sends to an egress DC, it
   needs to know the complete set of entry nodes (i.e., GWs) for that
   egress DC from the backbone network connecting the two DCs.  Note
   that it is assumed that the connected set of DCs and the backbone
   network connecting them are part of the same SR BGP Link State (LS)
   instance ([RFC7752] and [I-D.ietf-idr-bgpls-segment-routing-epe]) so
   that traffic engineering using SR may be used for these flows.

   Segment routing may also be operated in other domains, such as access
   networks.  Those domains also need to be connected across backbone
   networks through gateways.

   Suppose that there are two gateways, GW1 and GW2 as shown in
   Figure 1, for a given egress segment routing domain and that they
   each advertise a route to prefix X which is located within the egress
   segment routing domain with each setting itself as next hop.  One
   might think that the GWs for X could be inferred from the routes'
   next hop fields, but typically it is not the case that both routes
   get distributed across the backbone: rather only the best route, as
   selected by BGP, is distributed.  This precludes load balancing flows
   across both GWs.

           -----------------                    ---------------------
          | Ingress         |                  | Egress     ------   |
          | SR Domain       |                  | SR Domain |Prefix|  |
          |                 |                  |           |   X  |  |
          |                 |                  |            ------   |
          |       --        |                  |   ---          ---  |
          |      |GW|       |                  |  |GW1|        |GW2| |
           -------++--------                    ----+-----------+-+--
                  | \                               |          /  |
                  |  \                              |         /   |
                  |  -+-------------        --------+--------+--  |
                  | ||PE|       ----|      |----   |PE|    |PE| | |
                  | | --       |ASBR+------+ASBR|   --      --  | |
                  | |           ----|      |----                | |
                  | |               |      |                    | |
                  | |           ----|      |----                | |
                  | | AS1      |ASBR+------+ASBR|           AS2 | |
                  | |           ----|      |----                | |
                  |  ---------------        --------------------  |
               | |PE|                                           |PE| |
               |  --                 AS3                         --  |
               |                                                     |

         Figure 1: Example Segment Routing Domain Interconnection

   The obvious solution to this problem is to use the BGP feature that
   allows the advertisement of multiple paths in BGP (known as Add-
   Paths) [RFC7911] to ensure that all routes to X get advertised by
   BGP.  However, even if this is done, the identity of the GWs will be
   lost as soon as the routes get distributed through an Autonomous
   System Border Router (ASBR) that will set itself to be the next hop.
   And if there are multiple Autonomous Systems (ASes) in the backbone,
   not only will the next hop change several times, but the Add-Paths
   technique will experience scaling issues.  This all means that this the
   Add-Paths approach is limited to SR domains connected over a single

   This document defines a solution that overcomes this limitation and
   works equally well with a backbone constructed from one or more ASes.
   The solution uses the Tunnel Encapsulation attribute
   [I-D.ietf-idr-tunnel-encaps] as follows:

      We define a new tunnel type, "SR tunnel".  When the GWs to a given
      SR domain advertise a route to a prefix X within the SR domain,
      they will each include a Tunnel Encapsulation attribute with
      multiple tunnel instances each of type "SR tunnel", one for each
      GW, and each containing a Remote Endpoint sub-TLV with that GW's

   In other words, each route advertised by any GW identifies all of the
   GWs to the same SR domain (see Section 2 for a discussion of how GWs
   discover each other).  Therefore, even if only one of the routes is
   distributed to other ASes, it will not matter how many times the next
   hop changes, as the Tunnel Encapsulation attribute (and its remote
   endpoint sub-TLVs) will remain unchanged.

   To put this in the context of Figure 1, GW1 and GW2 discover each
   other as gateways for the egress SR domain.  Both GW1 and GW2
   advertise themselves as having routes to prefix X.  Furthermore, GW1
   includes a Tunnel Encapsulation attribute with a tunnel instance of
   type "SR tunnel" for itself and another for GW2.  Similarly, GW2
   includes a Tunnel Encapsulation for itself and another for GW1.  The
   gateway in the ingress SR domain can now see all possible paths to
   the egress SR domain regardless of which route advertisement is
   propagated to it, and it can choose one or balance traffic flows as
   it sees fit.

   The protocol extensions defined in this document are put into the
   broader context of SR domain interconnection by
   [I-D.farrel-spring-sr-domain-interconnect].  That document shows how
   other existing protocol elements may be combined with the extensions
   defined in this document to provide a full system.

2.  SR Domain Gateway Auto-Discovery

   To allow a given SR domain's GWs to auto-discover each other and to
   coordinate their operations, the following procedures are

   o  Each GW is configured with an identifier for the SR domain that is
      common across all GWs to the domain (i.e., across the same identifier is
      used by all GWs to the same SR domain) and unique across all SR
      domains that are interconnected) and unique connected (i.e., across all GWs to all SR domains
      that are connected. interconnected).

   o  A route target ([RFC4360]) is attached to each GW's auto-discovery
      route and has its value set to the SR domain identifier.

   o  Each GW constructs an import filtering rule to import any route
      that carries a route target with the same SR domain identifier
      that the GW itself uses.  This means that only these GWs will
      import those routes and that all GWs to the same SR domain will
      import each other's routes and will learn (auto-discover) the
      current set of active GWs for the SR domain.

   The auto-discovery route that each GW advertises consists of the

   o  An IPv4 or IPv6 NLRI containing one of the GW's loopback addresses
      (that is, with AFI/SAFI that is one of 1/1, 2/1, 1/4, or 2/4).

   o  A Tunnel Encapsulation attribute containing the GW's encapsulation
      information, which at a minimum consists of an SR tunnel TLV (type
      to be allocated by IANA) with a Remote Endpoint sub-TLV as
      specified in [I-D.ietf-idr-tunnel-encaps].

   To avoid the side effect of applying the Tunnel Encapsulation
   attribute to any packet that is addressed to the GW itself, the GW
   SHOULD use a different loopback address for the two cases.

   As described in Section 1, each GW will include a Tunnel
   Encapsulation attribute for each GW that is active for the SR domain
   (including itself), and will include these in every route advertised
   externally to the SR domain by each GW.  As the current set of active
   GWs changes (due to the addition of a new GW or the failure/removal
   of an existing GW) each externally advertised route will be re-
   advertised with the set of SR tunnel instances reflecting the current
   set of active GWs.

   If a gateway becomes disconnected from the backbone network, or if
   the SR domain operator decides to terminate the gateway's activity,
   it withdraws the advertisements described above.  This means that
   remote gateways at other sites will stop seeing advertisements from
   this gateway.  It also means that other local gateways at this site
   will "unlearn" the removed gateway and stop including a Tunnel
   Encapsulation attribute for the removed gateway in their

3.  Relationship to BGP Link State and Egress Peer Engineering

   When a remote GW receives a route to a prefix X it can use the SR
   tunnel instances within the contained Tunnel Encapsulation attribute
   to identify the GWs through which X can be reached.  It uses this
   information to compute SR TE paths across the backbone network
   looking at the information advertised to it in SR BGP Link State
   (BGP-LS) [I-D.gredler-idr-bgp-ls-segment-routing-ext] [I-D.ietf-idr-bgp-ls-segment-routing-ext] and correlated
   using the SR domain identity.  SR Egress Peer Engineering (EPE)
   [I-D.ietf-idr-bgpls-segment-routing-epe] can be used to supplement
   the information advertised in the BGP-LS.

4.  Advertising an SR Domain Route Externally

   When a packet destined for prefix X is sent on an SR TE path to a GW
   for the SR domain containing X, it needs to carry the receiving GW's
   label for X such that this label rises to the top of the stack before
   the GW completes its processing of the packet.  To achieve this we
   place a prefix-SID sub-TLV for X in each SR tunnel instance in the
   Tunnel Encapsulation attribute in the externally advertised route for

   Alternatively, if the GWs for a given SR domain are configured to
   allow remote GWs to perform SR TE through that SR domain for a prefix
   X, then each GW computes an SR TE path through that SR domain to X
   from each of the currently active GWs, and places each in an MPLS
   label stack sub-TLV [I-D.ietf-idr-tunnel-encaps] in the SR tunnel
   instance for that GW.

5.  Encapsulation

   If the GWs for a given SR domain are configured to allow remote GWs
   to send them a packet in that SR domain's native encapsulation, then
   each GW will also include multiple instances of a tunnel TLV for that
   native encapsulation in externally advertised routes: one for each GW
   and each containing a remote endpoint sub-TLV with that GW's address.
   A remote GW may then encapsulate a packet according to the rules
   defined via the sub-TLVs included in each of the tunnel TLV

6.  IANA Considerations

   IANA maintains a registry called "BGP parameters" with a sub-registry
   called "BGP Tunnel Encapsulation Tunnel Types."  The registration
   policy for this registry is First-Come First-Served.

   IANA is requested to assign a codepoint from this sub-registry for
   "SR Tunnel".  The next available value may be used and reference
   should be made to this document.

   [[Note: This text is likely to be replaced with a specific code point
   value once FCFS allocation has been made.]]

7.  Security Considerations

   From a protocol point of view, the mechanisms described in this
   document can leverage the security mechanisms already defined for
   BGP.  Further discussion of security considerations for BGP may be
   found in the BGP specification itself [RFC4271] and in the security
   analysis for BGP [RFC4272].  The original discussion of the use of
   the TCP MD5 signature option to protect BGP sessions is found in
   [RFC5925], while [RFC6952] includes an analysis of BGP keying and
   authentication issues.

   The mechanisms described in this document involve sharing routing or
   reachability information between domains: that may mean disclosing
   information that is normally contained within a domain.  So it needs
   to be understood that normal security paradigms based on the
   boundaries of domains are weakened.  Discussion of these issues with
   respect to VPNs can be found in [RFC4364] while [RFC7926] describes
   many of the issues associated with the exchange of topology or TE
   information between domains.

   Particular exposures resulting from this work include:

   o  Gateways to a domain will know about all other gateways to the
      same domain.  This feature applies within a domain and so is not a
      substantial exposure, but it does mean that if the protocol BGP
      exchanges within a domain can be snooped or if a gateway can be
      subverted then an attacker may learn the ful full set of gateways to a
      domain.  This facilitates more effective attacks on that domain.

   o  The existence of multiple gateways to a domain becomes more
      visible across the backbone and even into remote domains.  This
      means that an attacker is able to prepare a more comprehensive
      attack than exists when only the locally attached backbone network
      (e.g., the AS that hosts the domain) can see all of the gateways
      to a site.

   o  A node in a domain that does not have external BGP peering (i.e.,
      is not really a domain gateway and cannot speak BGP into the
      backbone network) may be able to get itself advertised as a
      gateway by letting other genuine gateways discover it (by speaking
      BGP to them within the domain) and so may get those genuine
      gateways to advertise it as a gateway into the backbone network.

   o  If it is possible to modify a BGP message within the backone, backbone, it
      may be possible to spoof the existence of a gateway.  This could
      cause traffic to be attracted to a specific node and might result
      in blackholing black-holing of traffic.

   All of the issues in the list above could cause disruption to domain
   interconnection, but are not new protocol vulnerabilities so much as
   new exposures of information that could be protected against using
   existing protocol mechanisms.  Furthermore, it is a general
   observation that if these attacks are possible then it is highly
   likely that far more significant attacks can be made on the routing
   system.  It should be noted that BGP peerings are not discovered, but
   always arrise arise from explicit configuration.

8.  Manageability Considerations

   The principal configuration item added by this solution is the
   allocation of an SR domain identifier.  The same identifier must be
   assigned to every GW to the same domain, and each domain must have a
   different identifier.  This requires coordination probably through a
   central management agent.


9.  Acknowledgements

   Thanks to Bruno Rijsman for review comments, and to Robert Raszuk for
   useful discussions.

10.  References

10.1.  Normative References

              Previdi, S., Filsfils, C., Patel, K., Ray, S., and J.
              Dong, "BGP-LS extensions for Segment Routing BGP Egress
              Peer Engineering", draft-ietf-idr-bgpls-segment-routing-
              epe-15 (work in progress), June 2017. March 2018.

              Rosen, E., Patel, K., and G. Velde, "The BGP Tunnel
              Encapsulation Attribute", draft-ietf-idr-tunnel-encaps-07 draft-ietf-idr-tunnel-encaps-09
              (work in progress), July 2017. February 2018.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,

   [RFC4271]  Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A
              Border Gateway Protocol 4 (BGP-4)", RFC 4271,
              DOI 10.17487/RFC4271, January 2006,

   [RFC4360]  Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended
              Communities Attribute", RFC 4360, DOI 10.17487/RFC4360,
              February 2006, <>.

   [RFC5925]  Touch, J., Mankin, A., and R. Bonica, "The TCP
              Authentication Option", RFC 5925, DOI 10.17487/RFC5925,
              June 2010, <>.

   [RFC7752]  Gredler, H., Ed., Medved, J., Previdi, S., Farrel, A., and
              S. Ray, "North-Bound Distribution of Link-State and
              Traffic Engineering (TE) Information Using BGP", RFC 7752,
              DOI 10.17487/RFC7752, March 2016,

10.2.  Informative References

              Farrel, A. and J. Drake, "Interconnection of Segment
              Routing Domains - Problem Statement and Solution
              Landscape", draft-farrel-spring-sr-domain-interconnect-00 draft-farrel-spring-sr-domain-interconnect-03
              (work in progress), June 2017.

   [I-D.gredler-idr-bgp-ls-segment-routing-ext] January 2018.

              Previdi, S., Psenak, P., Talaulikar, K., Filsfils, C., Gredler, H., Chen,
              and j., M. Chen, "BGP Link-State extensions for Segment
              Routing", draft-gredler-idr-bgp-ls-segment-
              routing-ext-04 draft-ietf-idr-bgp-ls-segment-routing-ext-06
              (work in progress), October 2016. April 2018.

              Filsfils, C., Previdi, S., Ginsberg, L., Decraene, B.,
              Litkowski, S., and R. Shakir, "Segment Routing
              Architecture", draft-ietf-
              spring-segment-routing-12 draft-ietf-spring-segment-routing-15 (work
              in progress), June 2017. January 2018.

   [RFC4272]  Murphy, S., "BGP Security Vulnerabilities Analysis",
              RFC 4272, DOI 10.17487/RFC4272, January 2006,

   [RFC4364]  Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private
              Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February
              2006, <>.

   [RFC6952]  Jethanandani, M., Patel, K., and L. Zheng, "Analysis of
              BGP, LDP, PCEP, and MSDP Issues According to the Keying
              and Authentication for Routing Protocols (KARP) Design
              Guide", RFC 6952, DOI 10.17487/RFC6952, May 2013,

   [RFC7911]  Walton, D., Retana, A., Chen, E., and J. Scudder,
              "Advertisement of Multiple Paths in BGP", RFC 7911,
              DOI 10.17487/RFC7911, July 2016,

   [RFC7926]  Farrel, A., Ed., Drake, J., Bitar, N., Swallow, G.,
              Ceccarelli, D., and X. Zhang, "Problem Statement and
              Architecture for Information Exchange between
              Interconnected Traffic-Engineered Networks", BCP 206,
              RFC 7926, DOI 10.17487/RFC7926, July 2016,

Authors' Addresses

   John Drake
   Juniper Networks


   Adrian Farrel
   Juniper Networks


   Eric Rosen
   Juniper Networks


   Keyur Patel
   Arrcus, Inc.


   Luay Jalil