Internet-Draft BGP Route Broker November 2024
Xu, et al. Expires 5 May 2025 [Page]
Workgroup:
Network Working Group
Internet-Draft:
draft-xu-idr-bgp-route-broker-05
Published:
Intended Status:
Standards Track
Expires:
Authors:
X. Xu
China Mobile
S. Hegde
Juniper
S. Sangli
Juniper
S. Zhang
Huawei
J. Dong
Huawei

BGP Route Broker for Hyperscale SDN

Abstract

This document describes an optimized mechanism for BGP route reflection, known as BGP route broker. It aims to utilize the BGP-based IP VPN as an overlay routing protocol in a scalable manner, specifically for hyperscale data center network virtualization environments, commonly referred to as Software-Defined Network (SDN) environments.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 5 May 2025.

Table of Contents

1. Problem Statement

BGP/MPLS IP VPN has been successfully deployed in global service provider networks for two decades, proving its scalability in large-scale environments. Here, the BGP/MPLS IP VPN means both BGP/MPLS IPv4 VPN [RFC4364] and BGP/MPLS IPv6 VPN [RFC4659] . In addition, the BGP/MPLS IP VPN-based data center network virtualization approach as described in [RFC7814], especially in the virtual PE model as described in [I-D.ietf-bess-virtual-pe] has been widely deployed in small to medium-sized data centers for the purpose of network virtualization, also known as Software Defined Networking (SDN). Examples include, but are not limited to, Tungsten Fabric (formerly known as OpenContrail) and .

Hyperscale cloud data centers typically contain tens of thousands of servers that are virtualized as Virtual Machines (VMs) or containers. This results in at least tens of thousands of virtual Provider Edges (PEs), millions of Virtual Private Networks (VPNs), and tens of millions of VPN routes from the perspective of network virtualization, assuming a virtual PE model is utilized. However, this creates a significant challenge regarding the capacity of BGP sessions and the VPN routing table of any given BGP router.

The route reflection (RR) mechanism is essential for addressing BGP scaling issues. In a one-level route reflector architecture, VPN routes within a data center can be distributed across multiple route reflectors. This is achieved by preconfiguring each route reflector with a designated block of route targets associated with specific VPNs. As a result, no single route reflector needs to manage all the VPN routes supported by the data center. For redundancy, multiple route reflectors should be configured with the same block of route targets to create a route reflector cluster.

If each virtual PE is connected to at least one VPN associated with a specific route reflector, that route reflector would need to establish BGP sessions with all virtual PEs. This can lead to excessive BGP session demands on route reflectors. To address this scaling issue, a new level of route reflectors, referred to as bottom-level route reflectors, can be introduced between the existing top-level route reflectors and the virtual PEs. Each top-level route reflector would then establish BGP sessions with all the bottom-level route reflectors instead of with all the virtual PE routers. Additionally, the bottom-level route reflectors would only need to establish BGP sessions with a subset of the virtual PEs. This partitioning mechanism effectively resolves the scaling issue related to BGP session capacity.

In a two-level route reflector (RR) hierarchy within hyperscale data centers, using the Route Target Constraint (RTC) mechanism [RFC4684] presents two main drawbacks. First, it can be challenging to effectively partition all the VPN routes supported by the data center among multiple top-level Route Reflectors. Second, virtual PEs have to receive RT membership Network Layer Reachability Information (NLRIs) related to all the route targets supported by the data center which leads to unnecessary consumption of CPU and RAM resources on the virtual PEs.

1.1. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

2. Solution Overview

Bottom-level route reflectors, also known as route brokers, are designed using the message queuing mechanisms like RabbitMQ. These route brokers maintain route target membership information for their IBGP peers and reflect VPN routes among them as needed. Essentially, route brokers act as the message brokers or exchanges within the message queuing system. In contrast, top-level route reflectors, referred to as route collection servers, and virtual PEs, known as route broker clients, function as both message publishers (or producers) and subscribers (or consumers) within the same message queuing system.

3. Route-Target Membership Advertisement Process

Route collection servers advertise route target membership information based on the preconfigured block of route targets they have. Consequently, route brokers are aware of the VPNs partitioned to each server.

Route brokers advertise default route target membership information to their clients so as to collect VPN routes from their clients.

Route broker clients advertise their route target membership information based on a dynamically configured block of route targets. When a route broker receives this advertisement, it forwards the route target membership information to the corresponding route collection servers that are preconfigured to cover the advertised route targets. This action occurs only if the broker has not previously sent that route target membership information towards the corresponding route collection servers.

The advertisement of route target membership information is based on Route Target Outbound Route Filtering (ORF) as defined in [I-D.xu-idr-route-target-orf] .

4. Route Distribution Process

When a route broker receives a route update message from a route collection server containing VPN routes for a specific VPN, and if those routes in the update message are selected as the best routes, the route broker will store them in its local Routing Information Base (RIB) and then reflect these routes to its clients associated with that VPN. Additionally, the cluster ID of the route broker SHOULD be prepended when reflecting the VPN routes.

When a route broker receives a route update message from one of its client containing VPN routes for a specific VPN, and if those routes are selected as the best routes, the route broker will store these routes in their local RIBs and then reflect these routes to other iBGP peers including the corresponding route collection servers and other route broker clients associated with the same VPN. Additionally, the cluster ID of the route brokers SHOULD be prepended when they share these VPN routes.

When a route broker receives an implicit request for VPN routes associated with one or more VPNs (through the advertisement of route target membership information) from a route broker client, the route broker SHOULD respond by providing the relevant VPN routes stored in its local RIB to that client.

When a route broker receives an implicit route request for all the VPN routes for one or more VPNs (through the advertisement of route target membership information) from a route collection server, the route broker SHOULD respond by providing the relevant VPN routes stored in its local RIB which are learnt from its clients to that route collection server.

5. Deployment Considerations

To simplify control over VPN route distribution, each VPN SHOULD be assigned a globally unique export route target value.

In data center SDN environments, the advertisement of multiple paths for a given VPN prefix is needed, so virtual PEs SHOULD be assigned different route distinguishers (RDs).

Virtual PEs SHOULD NOT establish BGP sessions with more than one cluster of route brokers configured with the same cluster ID.

6. IANA Considerations

There is no need for IANA to do any action.

7. Security Considerations

TBD.

8. Acknowledgements

The authors would like to thank Robert Raszuk for valuable comments on this document.

9. References

9.1. Normative References

[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.
[RFC4364]
Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, , <https://www.rfc-editor.org/info/rfc4364>.
[RFC4659]
De Clercq, J., Ooms, D., Carugi, M., and F. Le Faucheur, "BGP-MPLS IP Virtual Private Network (VPN) Extension for IPv6 VPN", RFC 4659, DOI 10.17487/RFC4659, , <https://www.rfc-editor.org/info/rfc4659>.
[RFC4684]
Marques, P., Bonica, R., Fang, L., Martini, L., Raszuk, R., Patel, K., and J. Guichard, "Constrained Route Distribution for Border Gateway Protocol/MultiProtocol Label Switching (BGP/MPLS) Internet Protocol (IP) Virtual Private Networks (VPNs)", RFC 4684, DOI 10.17487/RFC4684, , <https://www.rfc-editor.org/info/rfc4684>.
[RFC5291]
Chen, E. and Y. Rekhter, "Outbound Route Filtering Capability for BGP-4", RFC 5291, DOI 10.17487/RFC5291, , <https://www.rfc-editor.org/info/rfc5291>.
[RFC7814]
Xu, X., Jacquenet, C., Raszuk, R., Boyes, T., and B. Fee, "Virtual Subnet: A BGP/MPLS IP VPN-Based Subnet Extension Solution", RFC 7814, DOI 10.17487/RFC7814, , <https://www.rfc-editor.org/info/rfc7814>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/info/rfc8174>.

9.2. Informative References

[I-D.ietf-bess-virtual-pe]
Fang, L., Fernando, R., Napierala, M., Bitar, N. N., and B. Rijsman, "BGP/MPLS VPN Virtual PE", Work in Progress, Internet-Draft, draft-ietf-bess-virtual-pe-00, , <https://datatracker.ietf.org/doc/html/draft-ietf-bess-virtual-pe-00>.
[I-D.xu-idr-route-target-orf]
Xu, X., Hegde, S., Sangli, S. R., Zhuang, S., and J. Dong, "Route Target ORF", Work in Progress, Internet-Draft, draft-xu-idr-route-target-orf-01, , <https://datatracker.ietf.org/doc/html/draft-xu-idr-route-target-orf-01>.

Authors' Addresses

Xiaohu Xu
China Mobile
Shraddha Hegde
Juniper
Srihari Sangli
Juniper
Shunwan
Huawei
Jie
Huawei