BGP is THE protocol of Internet, it is used to exchange routing information between other BGP systems between Internet domains. It comes in two flavours:
External BGP(eBGP): Network Layer Reachability Information (NLRI) is exchanged between network domain called Autonomous system usually administratively independant. We are speaking about BGP inter-domain routing. As an example, let's us assume a BGP speaker from AS2200 (RENATER) advertising NLRI information to AS20965 (GÉANT R&E). From that point AS20965 has the knowledge of how to reach any network advertised by AS2200 based on the NLRI information.
Internal BGP (iBGP): NLRI is propagated between BGP speakers inside the same domain. We are speaking about BGP intra-domain routing. As an example, assume border router AS2200 in Paris connected to GEANT network and get NLRI information from AS20965. I will then propagate this information internally and advertise GEANT NLRI information via iBGP session to other BGP speaker inside network domain for AS2200.
iBGP requires a full mesh network between all BGP speakers inside a domain because of an anti-AS loop avoidance. Thus requiring n*(n-1)/2 number of sessions to be implemented. BGP route reflection is a proposal that remove full mesh requirement. BGP Edge router has now only 1 BGP session toward the RR, thus reducing network equipment workload.
In this article we will describe how to build a carrier grade route reflector cluster composed by RR1 and RR2. In order to reach Telecom Internet Service provider 99,999% of availability:
Let's consider the architecture network of a fictitious service provider below, router reflector RR1 and RR2 are dual homed to a core P routers.
[ #001 ] - Cookbook
BGP RR main requirements
SR655 1 x EPYC 7302P, 64GB RAM, 2G CONTROLLER CACHE FLASH, 4x10G ports + SFP+ and 4x1G ports, 3 SSD 480GB MAINSTREAM, XCLARITY ENTERPRISE.
SR655 AMD EPYC 7302P (16C 2.8GHz 128MB Cache/155W) 32GB (2x32GB, 2Rx4 3200MHz RDIMM), No Backplane, SATA, 1x750W, Tooless Rails
ThinkSystem 2x32GB TruDDR4 3200MHz (2Rx4 1.2V) RDIMM-A
ThinkSystem SR655 2.5 SATA/SAS 8-Bay Backplane Kit
ThinkSystem RAID 930-8i 2GB Flash PCIe 12Gb Adapter
ThinkSystem 2.5 5300 480GB Mainstream SATA 6Gb Hot Swap SSD
ThinkSystem SR655 x16/x8/x8 PCIe Riser1 FH Kit
ThinkSystem SR635/SR655 x8 PCIe Internal Riser Kit
ThinkSystem Broadcom 57454 10/25GbE SFP28 4-port OCP Ethernet Adapter
ThinkSystem Broadcom 5720 1GbE RJ45 2-Port PCIe Ethernet Adapter
SFP+ SR Transceiver
ThinkSystem 750W(230/115V) Platinum Hot-Swap Power Supply
2.8m, 10A/100-250V, C13 to IEC 320-C14 Rack Power Cable
ThinkSystem Toolless Slide Rail Kit with 2U CMA
ThinkSystem SR655 Fan Option Kit
ThinkSystem SR635/SR655 Supercap Installation Kit
BGP RR main requirements
RR is a specific component inside a service provider environment:
- The BGP RR is not in the data path inside the backbone, this can be adjusted by setting hight IGP metrics inside the code backbone.
- BGP traffic does not require a tremendous throughput so no need to have hardware NIC assisted forwarding mechanism such as dpdk.
- A NREN route reflector with 2xIPv4 and 2xIPv6 full views coming from 2 upstream provider requires steady ~ 10 Mbps traffic rates, so we can assume that 10GE connection will be sufficient for the next decades all address-family included.
- As of 2020/07/13, the Internet IPv4 routing table size is 839945 entries
- As of 2020/07/13, the Internet IPv6 routing table size is 91062 entries
both cumulated with BGP other address families needs a constant usage of ~ 4GB of memory:
# show watchdog memory
- So in the config above 64 Gbytes of RAM is sufficient in order to cache all the IPv4 and IPv6 routing table in memory (and also other BGP address family tables). It will be also largely enough in case of network instability, events that involves more CPU/memory usage related convergence computation.
- We have no incentive in proposing a server with the above brand. It just happen that this server was already bought and its configuration is matching perfectly the use case requirement but again, this is pure coincidence
- 10GE port connection might be overkill, but in a Service Provider context this is the norm. It will avoid adjacent core routers to implement 1GE connectivity
- PCIe GEN4 is available, and thus provide a tremendous amount of bandwidth for disk R/W operation. Though useful for the OS application, BGP RR setup won't take a direct advantage from PCIE GEN4.
- Indeed in this configuration considering the amount of RAM we have we will disable SWAP operations.
BGP RR distinct data path
- Connect the server with 2 NIC using optical SFP ( Broadcom 57454 10/25GbE SFP28 4-port OCP Ethernet Adapter) to core backbone routers following distinct dark fiber path.
- The link between C1 - C2 provides an additional level of redundancy
BGP RR out of band management
- Connect the server with 1 NIC using RJ45 (Broadcom 5720 1GbE RJ45 2-Port PCIe Ethernet Adapter) to the KVM or Out fo band management network
Do not forget ...
One point overlooked is the environment. As said BGP is a central component in service provider network. It must be deployed considering the following recommendations:
- Deploy an RR in carrier hotel
- With sufficient cooling
- With sufficient power. Make also sure to have redundant power and use dual PSU connected to different energy source
- Rack properly the server and make sure it is installed without blocking airflow as per server vendor advice
Install OS supported in your company
- Use only stable branch also called LTS operating system like Debian 10 or Ubuntu 18.04 and ubuntu 20.04
- Apply your IT strip down security patch and make it enter your server maintenance process
- In our case we will use Debian 10
BGP RR Life cycle management
It is important to note that now, BGP RR is subject to your company server hardware maintenance and that the software is not part of it.
- Server hardware maintenance is now applied to a network equipment
- The software is maintained by freeRouter project members
For those you would like to rebuild these binaries you can find the compilation shell script in freeRouter cloned git repository in: ~/freeRouter/src/native/c.sh
No throughput required
- In this case simple pcapInt packet forwarding is recommended
- In this setup all freeRouter functionalities are natively available
- freeRouter heavily uses the concept of thread, hence 16 CPU cores will be fully exploited
freeRouter upgrades involves 3 aspects:
- It is pretty unusual, but as freeRouter is using Java, you have to follow Java software update recommandation
- freeRouter control plane software it self, it is essentiallaly a rtr.jar file that has to be replaced by the latest version
- freeRouter dataplane software pcapInt upgrade. pcapInt upgrade are unusual but still has to be checked in freeRouter release notes
We are (at last) now ready to configure freeRouter as a BGP route reflector !
FreeRouter uses 2 configuration files in order to run, let's write these configuration files for R1 in ~/freeRouter/etc
BGP RR interfaces
- eth1 is BGP port eth1, port 10011 is freeRouter port while 10012 is the port associated to pcapInt associated in linux interface in NIC #1
- eth2 is BGP port eth2, port 10021 is freeRouter port while 10022 is the port associated to pcapInt associated in linux interface in NIC #2
- For now freeRouter will be accessible only via telnet session on port 2323
In this article you:
- had a brief introduction of BGP protocol and BGP route reflector rationale
- learned the design consideration related to BGP RR setup
- got a typical BGP configuration example with a long list of AFI/SAFI enabled
- This configuration is not exhaustive as for example BGP add-path is available but not configured
- verified BGP RR operation
RARE validated design: [ BGP RR #001 ]- key take-away
- BGP Router Reflector use case does not require a commercial vendor router, it can be handled perfectly by a sowftare solution running on a server with enoough RAM.
The example above an example of a high availability Route Reflector that is able to handle BGP signalling for a high carrier Service Provider for all address familay
- Redundant BGP Router Reflection is ensured by deploying 2 RR (at minimum) belonging to the same BGP RR cluster
In addition to have several RR for the whole domain, it is also common to see hierarchical RR design. SOme Service provider deploy dedicated RR for specific address family (L3VPN unicast for example)
- RR in the same cluster run basic iBGP session
These RR also share the same cluster ID, in order to ensure route withdraw in case of routing advertisement
- RR should not be in the traffic datapath
This is the reason why we are setting high cost (4444 and 6666) for IPv4 and IPv6 respectively on both direction on the RR(s) interconnections ports
- RR design for a multi-service backbone
In the example, the RR client are running only IPv4/IPv6 but the RR design above can empower a Service provider backbone with additional service running on TOP of MPLS, L3VPN, 6VPE, VPLS EVPN etc.
- In the next article we will dissect the rr1 configurations
This will demonstrate some nice features proposed by freeRouter such as BGP template and nexthop tracking among a list of other feature not mentioned here... (like BGP add-path)
RR design test
You can test this design above in order to check RR and backbone router signalling.
- Set up freeRouter environment as describe above
- Get RARE code