Internet-Draft | BGP Route Broker | November 2024 |
Xu, et al. | Expires 5 May 2025 | [Page] |
This document describes an optimized mechanism for BGP route reflection, known as BGP route broker. It aims to utilize the BGP-based IP VPN as an overlay routing protocol in a scalable manner, specifically for hyperscale data center network virtualization environments, commonly referred to as Software-Defined Network (SDN) environments.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 5 May 2025.¶
Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
BGP/MPLS IP VPN has been successfully deployed in global service provider networks for two decades, proving its scalability in large-scale environments. Here, the BGP/MPLS IP VPN means both BGP/MPLS IPv4 VPN [RFC4364] and BGP/MPLS IPv6 VPN [RFC4659] . In addition, the BGP/MPLS IP VPN-based data center network virtualization approach as described in [RFC7814], especially in the virtual PE model as described in [I-D.ietf-bess-virtual-pe] has been widely deployed in small to medium-sized data centers for the purpose of network virtualization, also known as Software Defined Networking (SDN). Examples include, but are not limited to, Tungsten Fabric (formerly known as OpenContrail) and .¶
Hyperscale cloud data centers typically contain tens of thousands of servers that are virtualized as Virtual Machines (VMs) or containers. This results in at least tens of thousands of virtual Provider Edges (PEs), millions of Virtual Private Networks (VPNs), and tens of millions of VPN routes from the perspective of network virtualization, assuming a virtual PE model is utilized. However, this creates a significant challenge regarding the capacity of BGP sessions and the VPN routing table of any given BGP router.¶
The route reflection (RR) mechanism is essential for addressing BGP scaling issues. In a one-level route reflector architecture, VPN routes within a data center can be distributed across multiple route reflectors. This is achieved by preconfiguring each route reflector with a designated block of route targets associated with specific VPNs. As a result, no single route reflector needs to manage all the VPN routes supported by the data center. For redundancy, multiple route reflectors should be configured with the same block of route targets to create a route reflector cluster.¶
If each virtual PE is connected to at least one VPN associated with a specific route reflector, that route reflector would need to establish BGP sessions with all virtual PEs. This can lead to excessive BGP session demands on route reflectors. To address this scaling issue, a new level of route reflectors, referred to as bottom-level route reflectors, can be introduced between the existing top-level route reflectors and the virtual PEs. Each top-level route reflector would then establish BGP sessions with all the bottom-level route reflectors instead of with all the virtual PE routers. Additionally, the bottom-level route reflectors would only need to establish BGP sessions with a subset of the virtual PEs. This partitioning mechanism effectively resolves the scaling issue related to BGP session capacity.¶
In a two-level route reflector (RR) hierarchy within hyperscale data centers, using the Route Target Constraint (RTC) mechanism [RFC4684] presents two main drawbacks. First, it can be challenging to effectively partition all the VPN routes supported by the data center among multiple top-level Route Reflectors. Second, virtual PEs have to receive RT membership Network Layer Reachability Information (NLRIs) related to all the route targets supported by the data center which leads to unnecessary consumption of CPU and RAM resources on the virtual PEs.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
Bottom-level route reflectors, also known as route brokers, are designed using the message queuing mechanisms like RabbitMQ. These route brokers maintain route target membership information for their IBGP peers and reflect VPN routes among them as needed. Essentially, route brokers act as the message brokers or exchanges within the message queuing system. In contrast, top-level route reflectors, referred to as route collection servers, and virtual PEs, known as route broker clients, function as both message publishers (or producers) and subscribers (or consumers) within the same message queuing system.¶
Route collection servers advertise route target membership information based on the preconfigured block of route targets they have. Consequently, route brokers are aware of the VPNs partitioned to each server.¶
Route brokers advertise default route target membership information to their clients so as to collect VPN routes from their clients.¶
Route broker clients advertise their route target membership information based on a dynamically configured block of route targets. When a route broker receives this advertisement, it forwards the route target membership information to the corresponding route collection servers that are preconfigured to cover the advertised route targets. This action occurs only if the broker has not previously sent that route target membership information towards the corresponding route collection servers.¶
The advertisement of route target membership information is based on Route Target Outbound Route Filtering (ORF) as defined in [I-D.xu-idr-route-target-orf] .¶
When a route broker receives a route update message from a route collection server containing VPN routes for a specific VPN, and if those routes in the update message are selected as the best routes, the route broker will store them in its local Routing Information Base (RIB) and then reflect these routes to its clients associated with that VPN. Additionally, the cluster ID of the route broker SHOULD be prepended when reflecting the VPN routes.¶
When a route broker receives a route update message from one of its client containing VPN routes for a specific VPN, and if those routes are selected as the best routes, the route broker will store these routes in their local RIBs and then reflect these routes to other iBGP peers including the corresponding route collection servers and other route broker clients associated with the same VPN. Additionally, the cluster ID of the route brokers SHOULD be prepended when they share these VPN routes.¶
When a route broker receives an implicit request for VPN routes associated with one or more VPNs (through the advertisement of route target membership information) from a route broker client, the route broker SHOULD respond by providing the relevant VPN routes stored in its local RIB to that client.¶
When a route broker receives an implicit route request for all the VPN routes for one or more VPNs (through the advertisement of route target membership information) from a route collection server, the route broker SHOULD respond by providing the relevant VPN routes stored in its local RIB which are learnt from its clients to that route collection server.¶
To simplify control over VPN route distribution, each VPN SHOULD be assigned a globally unique export route target value.¶
In data center SDN environments, the advertisement of multiple paths for a given VPN prefix is needed, so virtual PEs SHOULD be assigned different route distinguishers (RDs).¶
Virtual PEs SHOULD NOT establish BGP sessions with more than one cluster of route brokers configured with the same cluster ID.¶
There is no need for IANA to do any action.¶
The authors would like to thank Robert Raszuk for valuable comments on this document.¶