Blog


This is a special blog series called "RARE software architecture". As its name implies, it deals with topics related to RARE/freeRouter software design choice.

Requirement

  • Basic Linux/Unix knowledge
  • Service provider networking knowledge

Overview

RARE project objective is to provide a routing platform proposing various solutions addressing multiple use cases in the R&E landscape. In the picture below you see in purple the different use cases:

As you can notice, each use case will run on different hardware that potentially can have different dataplanes. As we were starting from a clean slate environment without much choice, especially with P4 programmability - the first dataplane or P4 target considered was BMv2. BMv2 is an excellent way to learn P4, it is also the first target we use in order to program and validate new features. After 6 months of practising our "P4-fu" we developed:

  • a P4lang repository for ubuntu bionic and focal
  • a debian 10 repository
  • had our first RARE/FreeRouter prototype powered by a P4 BMv2 dataplane !

Our initial work, considering FreeRouter's Java nature, was to write a Java P4Runtime GRPC client that would be able to program the entries in the tables exposed by BMv2 via the P4Info file. However, this would have intimately tied FreeRouter code to P4Runtime gRPC code. Even if it's more natural to choose this solution, going in that direction implied that dataplanes other than BMv2 would be compliant to P4Runtime. It turns out that this is not the case. We then opted for a simple message API via a bi-directional raw UNIX socket. We will see what this means later in this blog.

Motivated by the successful experience with BMv2, we then decided to move forward and started to study TOFINO as a target. We were greedy and eager to apply our P4 code against multi-terabits traffic. After a few P4 program compilations, the first impression from my personal perspective was ... mind blowing ! INTEL/BAREFOOT TOFINO effectively opened the door to multi-terabits packet processing... Just to have at the tip of your finger the possibility to process traffic at these traffic levels was exciting !

As a side note, the journey was not without suffering and pain... (smile) We had to port our BMv2 code - and to port to TOFINO was not "Une lettre à la poste"... It is not that TOFINO programming is gratuitously painful. It is just that it is p4c-tofino's job to make sure that our packets are processed at silicon lighting speed. Imagine you are asked to  convey parcels by driving from Paris to Amsterdam with a car that has an infinitely sized trunk, with an infinite gas tank and no particular speed constraint along the road. And then you are asked to do the same trip, but with an actual real car that has a trunk with a fixed size and with a 50 litre gas tank, and of course you'll have to follow speed signs along the road.

In the first case, you would put as many parcels as you would like and you even won't bother looking at your gas tank level and maybe you'd set the speed to 200 Km/h. The second case forces you to carefully think about how many parcels you can put in your trunk, check to see if one completely full tank can be sufficient for the trip and of course, you would have to follow the speed signs.

If you allow me this comparison, this is where BMv2 and TOFINO programming differs.  

But, this pain was not in vain, it was for the greater good... You can't imagine the inherent joy when you see the TOFINO compiler displaying the DONE word ! For the veterans who can remember, it is the same feeling when you manage to compile your first program in the ADA language. The compiler is not so strict that compiling an ADA program is in itself a feat. No wonder why this language is used in Spatial rocket (Ariane).

Back to our dataplane interface story, even TOFINO and BMv2 share some roots, while BMv2 had P4Runtime as a northnound interface, INTEL/BAREFOOT pushed into TOFINO platform with P4_16 their gRPC interface counterpart: BfRuntime.

Our best bet paid off as FreeRouter message API was unchanged and without much effort we could add a new dataplane "wingman" to the FreeRouter control plane.

To recap:

  • For BMv2: Our interface yields P4Runtime RPC calls. This program is called: forwarder.py
  • For TOFINO: Our interface yields BfRuntime RPC calls. This program is called witout too much originality: bf_forwader.py

At that point we were starting to have a decent LSR/LER router for CORE and Aggregation use cases.

But we still had nothing at the EDGE/AGGREGATION layer in terms of a solution proposal, deploying P4 hardware might be way too expensive in specific contexts such as small R&E institutions like primary schools or small R&E labs. To that purpose, we started to study new targets such as VMWARE XDP and a very promising project: T4P4S ELTE. While we could not use XDP without a lot of P4 code rewriting and compromise, T4P4S ELTE was from our perpective very promising. But due to a compilation issue, we could not move forward.

FPGA was also a solution that we considered but had no access to any FPGA hardware that was P4 compliant.

As a result, we were a little bit bitter and started to read the DPDK library. And we started to play with DPDK examples... These examples were tremendously useful as it sparked some DPDK development into the RARE team. Csaba, the FreeRouter lead developer, step by step came up with this GENIUS idea: why don't we just use emulate P4 RARE P4 dataplane program ? We can still revert to using T4P4S ELTE when it will be ready ?

P4emu/P4dpdk was then born ! 

To conclude this short story, RARE/FreeRouter has now 3 completely different dataplanes: (in order of appearance)

  • BMv2
  • TOFINO
  • DPDK


Unique RARE/FreeRouter feature

However, please note that FreeRouter message API is common to the three dataplanes listed above. You'll see further how this structure make the solution: an open modular, interchangeable solution.

Article objective

In this article, let's present RARE/FreeRouter platform structure and focus on the interface(S) between FreeRouter control plane and various dataplane.

Diagram

[ #001 ] - Modular design

In this designs, FreeRouter is focusing on running control plane processes, such as routing protocols IGP(s), BGP(s). There are other control plane processes but let's just focus on these latter. At some point in time, all IGPs/EGP converge and will have to create an entry in a routing table. In case of IPv4 the entry will be created into an IPv4 forwarding table and similarly, an IPv6 route entry will be created into IPv6 forwarding table. From FreeRouter point of view these entry creation will be triggered by yielding one Java function twice that will generate these 2 API messages, one for IPv4 and the other one for IPv6.

Let's add an IPv4 route into freeRouter CLI

route addition via freeRouter
conf t
ipv4 route v1 1.2.3.0 255.255.255.0 4.4.4.4
...

Upon entering the ipv4 route and pressing <enter>, you'll see the following message appearing

message API: route4_add
...
rx: ['route4_add', '1.2.3.0/24', '13063', '4.4.4.4', '1', '\n']
...

Let's delete the route via FreeRouter CLI

route deletion via freeRouter
conf t
no ipv4 route v1 1.2.3.0 255.255.255.0 4.4.4.4
...
message API: route4_del
...
rx: ['route4_del', '1.2.3.0/24', '13063', '4.4.4.4', '1', '\n']
...

Important note

In short, the message API is simply a collection of message that would trigger an entry ADD/DELETE/MODIFY into the dataplane corresponding table.

The documentation of this message API will be documented and published soon, but for those who are curious and can't wait this documentation, you can read forwarder.py, bf_forwarder.py or p4dpdk.bin  source code

As said in the beginning of the article, freeRouter control plane would have to deal with dataplane of different nature. And we concluded in mentioning that for now, freeRouter has three dataplanes. Each of these dataplanes have their own northbound interface, whether this is P4Runtime for BMv2, BfRuntime for TOFINO or P4DPDK for system compatible with DPDK and having DPDK complinnt NIC.

For BMv2 we just had to write an interface that would translate freeRouter API message into P4Runtime GRPC calls. For BMv2 this interface is called forwarder.py:

For TOFINO we just had to write an interface that would translate freeRouter API message into BfRuntime GRPC calls. For TOFINO this interface is called bf_forwarder.py:

For DPDK we just had to write an interface that would translate freeRouter API message into DPDK primitives. This interface is included into DPDK dataplane bundled into freeRouter binaries: p4dpdk.bin

It is just as simple as that !

Discussion

This design is pretty unique because, if for any reason you would like to "hook" freeRouter control plane to an other dataplane such as:

  • FPGA
  • or dataplane powered by kernel bypass technique such as RDMA
  • Or other NPU based dataplane
  • etc.

This is possible !

You would "just" have to port your P4 code logic into the target dataplane and create an interface able to translate API messages from FreeRouter into understandable message from the target dataplane.

Be cautious with the word "just"

The "just" word can be misleading. Indeed, depending on the target dataplane, it can be a huge task. With DPDK, we were lucky in getting enough material in order to move forward and again p4dpdk.bin was a simple trial at the very beginning. But some other dataplane can just be simply be ignored if we don't get enough material/support from NPU vendors. 

One thing that we did not experience, but this can be maybe one day a reality.

What if you have your own control plane and that you absolutely want to keep it, but would like to re-use BMv2/TOFINO or DPDK RARE dataplane ?

Well this is possible !

Long time ago I met Thomas MANGIN (yet another cool and nice French guy (smile) ) which is the author of Exa-BGP, i did not talk to him about this and I don't want to give him bad idea, but what if he would like to hook a TOFINO P4 dataplane to Exa-BGP ?

Well, he actually would just have to teach exaBGP to handle entry ADD/DELETE/MODIFY message according to the message API above.

I also love the work DONE at the SoNIC project level and I know that SoNIC has already a P4 dataplane called switch.p4. I doubt it will be the case one day but, what if SoNIC project wanted to re-use RARE dataplane for especially for Service Provider capability ?

OK, this sounds crazy, but the modular design we proposed here is valid and can make the RARE dataplane available for other control plane.

Of course, we strongly suggest you to stick with FreeRouter as you will just realize IMHO that in the TELCO Service Provider space there is no match. You'll have the venerable IOS-XR and JUNOS, but these are not Open Source counterparts.


Conclusion

In this 1st article you:

  • had a 10K feet view description of RARE/FreeRouter modular design
  • This design allow rapid dataplane addtion without altering whatsoever FreeRouter code base
  • In case you would like to re-use BMv2/TOFINO/P4DPDK dataplane, this has been never implemented but this is possible !

Message API documentation

From the time being this API  message is not yet publicly documented. However, it is available and buried inside forwarder.py or bf_forwarder.py source code. This is work in progress but if you feel an urgent need to use it feel free to read the code.

PS: We will publish this document ASAP, but time plays against us ...




This is a special blog series called "RARE hardware platform". As its name implies it deals with certified and tested platform on which RARE/freeRouter can run out of the box.

Requirement

  • Basic Linux/Unix knowledge
  • Service provider networking knowledge

Overview

We will deal with a series of article related to APS Networks® BF2556X-1T P4 switch. The key highlight of this box is: 

  • It is a P4 TOFINO NPU based switch
  • TOFINO version has 2 cores (i.e. 2 pipes) and can manage up to 2 Tbps
  • It offers multiple connection types and rates:
    • 48x25GSFP28 and 8x100GQSFP28
      • SFP28 port [1 - 16] can configure into 1G/10G/25G
      • SFP28 port [17 - 48] can configure into 10G/25G
      • QSFP28 port [49 - 56] Each QSFP28 port can configure into 1x100G,2x50G,4x25G, 1x40G or 4x10G Mode.
  • SyncE and 1588 support

Article objective

In this article, we will just do a basic introduction of the BF2556X-1T

[ #001 ] - BF2556X-1T in a nutshell

Parcel

What's in the box

Included items

Quick Installation guide

Front panel

Back panel

APS Networks® BF2556X-1T Racked

APS Networks® BF2556X-1T alongside to his P4 brothers: Edgecore WEDGE100BF32X

BF2556X-1T specification

The system uses Barefoot BFN-T10-032D-020 (Tofino 2.0T) Switch Chip which can support 20 x 100GE ports.

Major features are:

  • 2.0 Tbps bandwidth
  • One Barefoot BFN-T10-032D-020(Tofino 2.0T) Switch ASIC
    • Ethernet support 80x25G SERDES ports
    • Management SERDES support four ports 10G-KR
    • PCIe Gen3 x 4lanes
  • Eight Marvell 98PX1024
    • Single chip support 4x25G SERDES
  • Network Interface
    • 48x25G SFP28 and 8x100G QSFP28
    • SFP28 port 1~16 can configure into1G/10G/25G.
    • SFP28 port17~48 can configure into 10G/25G.
    • Each QSFP28 port can configure into 1x100G,2x50G,4x25G, 1x40G or 4x10G Mode.
  • CPU Module: Optional Module design for flexibility
    • Intel® Xeon® Processors D1527 (BDXDE)
  • BMC: Base Board Management Controller
    • BMC is a specialized service processor that monitors the physical state of a system.
    • ASPEED AST2520
  • Management Port:
    • 3xRJ45 10/100/1000Mbps OOBM(Out Of Band Management) port
    • 1xConsoleRJ45
    • 1xUSB3.0
  • FAN Tray:
    • Four 40mmx56mm Fan-tray
    • Supporting 3+1 redundancy
    • Support front to back and back to front air direction.
  • PSU:
    • 1+1 redundant PSU
    • Each PSU will be supporting 850W power to system.
    • 12V standby power for system management chips.
    • Support DC power supply
CPU specification
lscpu
Architecture:        x86_64                                                         
CPU op-mode(s):      32-bit, 64-bit                                                 
Byte Order:          Little Endian                                                  
CPU(s):              16                                                             
On-line CPU(s) list: 0-15                                                           
Thread(s) per core:  2                                                              
Core(s) per socket:  8                                                              
Socket(s):           1                                                              
NUMA node(s):        1                                                              
Vendor ID:           GenuineIntel                                                   
CPU family:          6                                                              
Model:               86                                                             
Model name:          Intel(R) Xeon(R) CPU D-1548 @ 2.00GHz                          
Stepping:            3                                                              
CPU MHz:             799.832                                                        
CPU max MHz:         2600.0000                                                      
CPU min MHz:         800.0000                                                       
BogoMIPS:            4000.16                                                        
Virtualization:      VT-x                                                           
L1d cache:           32K                                                            
L1i cache:           32K                                                            
L2 cache:            256K                                                           
L3 cache:            12288K                                                         
NUMA node0 CPU(s):   0-15                                                           
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperf mperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_ad just bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsa veopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts md_clear flush_l1d                                                                      

Discussion

The  APS Networks® BF2556X-1T is a horse power:

  • the usage of 8 cores having each one 2 threads speeds up P4 program compilation. (BF2556X-1T  as 2x more core than the WEDGE100BF32X)
  • SyncE 1588 might be certainly important for you should your P4 application require precise time synchronisation capability 
  • Having 1G/10G/25G/40/50G/100G connectivity via SFP28 and QSFP28 will make the BF2556X-1T ready for multiple use case.
    • In a P/PE architecture MPLS PE proposing 1G/10G connectivity and having uplink toward the core
    • In a collapse core can be used a MPLS PE router
    • Can be used as a leaf or Tor switch/router
    • BRAS/BNG router

Conclusion

In this 1st article you:

  • had a brief description APS Networks® BF2556X-1T hardware platform
  • The hardware provide p4 connectivity at 1GE capacity (16x1GE ports is available)
  • In addition to 1GE it also provide 10/25/40/50/100G connectivity

RARE hardware plarform: [ BF2556X-1T #001 ] - key take-away

  • From RARE/FreeRouter point of view, BF2556X-1T is very good candidate for PE (Provider Edge) router.

The 8x100G ports can make as a strong in a collapse core architecture (P function merge with PE functions), the box can also be used a a BGP route as it boast with 32 GB of RAM (~10 full BGP feeds), but you won't leverage the ports availability. It can be used to implement BRAS/BNG use case but would be also a good candidate as a ToR in Data Center envionment with BGP/MPLS capability and the possibility to provide 1GE connection to existing server purchased beforehand.

  • SyncE 1588 support is a key features if your application needs precision provided by PTP

As we will discover the box, we will explain in further articles how to benefit from this features. 

  • RARE/freeRouter @design can coexist with Virtualisation technology BF2556X-1T

We just started our experience with this box. You'll find further, a series of article dedicated to BF2556X-1T depicting:

  • How to proceed to initial OS installation
  • Proceed to  APS Networks® BF2556X-1T software installation (TOFINO SDE and Gearbox) installation
  • Port operations on TOFINO ports SFP28 port 16-47 and QSFP28 port 48-56  
  • Port operations on GearBox ports SFP28 port 1-16 (1G/10G/25G)
  • How to benefit from SyncE 1588 support
  • RARE/freeRouter effective installation

The installation will be implemented should be compliant to ISP TELECOM standard. (It should survives power outage, easy upgrade features, start automatically at boot time without any human intervention)



The 1st article presented you the hardware platform and the rationale behind the choices. Let's dive into the subject now!

Requirement

  • Basic Linux/Unix knowledge
  • Service provider networking knowledge

Overview

Several choices were possible, we finally ended up in following the KISS method. The Operating system requirements are:

  • requirement #0: LTS operating system 
  • requirement #1: Benefit from LTS security patches
  • requirement #2: Must be able to run DPDK
  • requirement #3: (personal requirement) Must be familiar to me
  • requirement #4: Able to run Java software as freeRouter is written in Java
  • requirement #5: Small operating system software footprint
  • requirement #6: Support for IPv4/IPv6

The hardest path would be:

The objective is to have tight control of the software installed on the appliance. This guarantees the smallest footprint we hope to obtain. For those familiar with OpenWRT, we can reach a tiny image size. My OpenWRT image is 5Mb.

  • Use of NixOS or Nix package manager

This provides an incredible feature: commit/rollback functionality at the package management level!

Note

The features above are still under study into RARE group. We will introduce these technologies once we feel more confident on how to integrate these technologies into a streamlined deployment process.

Article objective

In this article we will go through the major steps in deploying Debian 10 stable aka Buster in order to prepare freeRouter installation.

Diagrams

[ #002 ] - Cookbook

Get debian 10 minimal ISO
wget http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/

On MACOSX, burn the iso using balenaEtcher

balenaEtcher can be downloaded here

Via the appliance BIOS settings:

  • activate console port redirection:

Option d'activation du port série

  • configure serial port settings

Now that you have activated console port:

  • plug the USB key on which you previously burnt Debian 10
  • make sure you  set boot option from USB in BIOS settings
  • reboot

You can now proceed to the next step: Debian 10 installation

We will assume that you have installed Debian 10 on the 256 Gb SSD.

Just as a side note during the installation process you'll be prompted the: "Software selection" window, in this steps we will:

  • unselect everything
  • select "SSH server"

Software selection

This will guarantee the tiniest Debian 10 operating system software footprint. We will on demand install the needed packages manually.

On minimal installation, sudo is not installed, so all the software will be done as root.

minimal Java installation
apt-get update
apt-get install default-jre-headless

The latest DPDK software is needed. We use the Debian 10 backport repository in orcer to get DPDK 19.11.2-1~bpo10+1

dpdk from debian 10 backports repository
echo "deb http://deb.debian.org/debian buster-backports main" | tee /etc/apt/sources.list.d/buster-backports.list
apt-get update
apt-get install dpdk dpdk-dev
Check DPDK version
dpkg -l | grep dpdk
ii  dpdk                                    19.11.2-1~bpo10+1            amd64        Data Plane Development Kit (runtime)
ii  dpdk-dev                                19.11.2-1~bpo10+1            amd64        Data Plane Development Kit (dev tools)
ii  libdpdk-dev:amd64                       19.11.2-1~bpo10+1            amd64        Data Plane Development Kit (basic development files)
additional 3rd party software used by freeRouter
apt-get update
apt-get install unzip net-tools libpcap-dev ethtool default-jre-headless psmisc tcpdump

In this setup we will create a freeRouter folder at the filesystem root directory

Create freeRouter folder at filesystem root directory
mkdir /rtr
get freeRouter control plane software
cd /rtr 
wget http://freerouter.nop.hu/rtr.jar
get freeRouter net-tools tarball
cd /rtr 
tar xvf rtr.tar -C /rtr
rm rtr.tar

As freeRouter is handling the networking task, we have to disable the appliance networking. Forgetting to do so will result in conflicts and unpredictable behaviour. 

Disable networking from systemd perspective
systemctl set-default multi-user.target
rm /usr/lib/systemd/network/*
SVC="network-manager NetworkManager ModemManager systemd-network-generator systemd-networkd systemd-networkd-wait-online systemd-resolved hostapd wpa_supplicant"
systemctl disable $SVC
systemctl mask $SVC
freeRouter systemd startup script
cat /lib/systemd/system/rtr.service

[Unit]
Description=router processes
Wants=network.target
After=network-pre.target
Before=network.target

[Service]
Type=forking
ExecStart=/rtr/hwdet-all.sh

[Install]
WantedBy=multi-user.target
/rtr/hwdet-all.sh script
cat /rtr/hwdet-all.sh

#!/bin/sh

cd /rtr
echo 1 > /proc/sys/net/ipv6/conf/all/disable_ipv6
echo 1 > /proc/sys/net/ipv6/conf/default/disable_ipv6
echo 0 > /proc/sys/net/ipv6/conf/lo/disable_ipv6
ip link set lo up mtu 65535
ip addr add 127.0.0.1/8 dev lo
ip addr add ::1/128 dev lo

# DPDK
echo 96 > /proc/sys/vm/nr_hugepages
modprobe uio_pci_generic

dpdk-devbind.py -b uio_pci_generic 01:00.0 
dpdk-devbind.py -b uio_pci_generic 02:00.0 
dpdk-devbind.py -b uio_pci_generic 05:00.0 
dpdk-devbind.py -b uio_pci_generic 06:00.0 
dpdk-devbind.py -b uio_pci_generic 07:00.0 
dpdk-devbind.py -b uio_pci_generic 08:00.0 

#VETH for CPU_PORT and OOBM_PORT
ip link add veth0a type veth peer name veth0b

ip link set veth0a multicast on
ip link set veth0a allmulti on
ip link set veth0a promisc on
ip link set veth0a mtu 8192
ip link set veth0a up

ip link set veth0b multicast on
ip link set veth0b allmulti on
ip link set veth0b promisc on
ip link set veth0b mtu 8192
ip link set veth0b up

ethtool -K veth0a rx off
ethtool -K veth0a tx off
ethtool -K veth0a sg off
ethtool -K veth0a tso off
ethtool -K veth0a ufo off
ethtool -K veth0a gso off
ethtool -K veth0a gro off
ethtool -K veth0a lro off
ethtool -K veth0a rxvlan off
ethtool -K veth0a txvlan off
ethtool -K veth0a ntuple off
ethtool -K veth0a rxhash off
ethtool --set-eee veth0a eee off

ethtool -K veth0b rx off
ethtool -K veth0b tx off
ethtool -K veth0b sg off
ethtool -K veth0b tso off
ethtool -K veth0b ufo off
ethtool -K veth0b gso off
ethtool -K veth0b gro off
ethtool -K veth0b lro off
ethtool -K veth0b rxvlan off
ethtool -K veth0b txvlan off
ethtool -K veth0b ntuple off
ethtool -K veth0b rxhash off
ethtool --set-eee veth0b eee off

ip link add veth1a type veth peer name veth1b

ip link set veth1a multicast on
ip link set veth1a allmulti on
ip link set veth1a promisc on
ip link set veth1a mtu 1500
ip link set veth1a up

ip link set veth1b multicast on
ip link set veth1b allmulti on
ip link set veth1b promisc on
ip link set veth1b mtu 8192
ip link set veth1b up

ip link set wlan0 up

ethtool -K veth1a rx off
ethtool -K veth1a tx off
ethtool -K veth1a sg off
ethtool -K veth1a tso off
ethtool -K veth1a ufo off
ethtool -K veth1a gso off
ethtool -K veth1a gro off
ethtool -K veth1a lro off
ethtool -K veth1a rxvlan off
ethtool -K veth1a txvlan off
ethtool -K veth1a ntuple off
ethtool -K veth1a rxhash off
ethtool --set-eee veth1a eee off

ethtool -K veth1b rx off
ethtool -K veth1b tx off
ethtool -K veth1b sg off
ethtool -K veth1b tso off
ethtool -K veth1b ufo off
ethtool -K veth1b gso off
ethtool -K veth1b gro off
ethtool -K veth1b lro off
ethtool -K veth1b rxvlan off
ethtool -K veth1b txvlan off
ethtool -K veth1b ntuple off
ethtool -K veth1b rxhash off
ethtool --set-eee veth1b eee off

ip addr flush dev veth1a 
ip addr add 192.168.128.254/24 dev veth1a

#ADD DEFAULT ROUTE to OOBM SDN999
route add default gw 192.168.128.1

# START RTR !
start-stop-daemon -S -b -x /rtr/hwdet-main.sh
make hwdet-main.sh executable
chmod u+x /rtr/hwdet-main.sh

A bit of explanation

Disable IPv6
echo 1 > /proc/sys/net/ipv6/conf/all/disable_ipv6
echo 1 > /proc/sys/net/ipv6/conf/default/disable_ipv6
echo 0 > /proc/sys/net/ipv6/conf/lo/disable_ipv6
ip link set lo up mtu 65535

Note that IPv6 operation would occur on the host itself, IPv6 will be handled at freeRouter level

Disable IPv6
echo 96 > /proc/sys/vm/nr_hugepages
modprobe uio_pci_generic

dpdk-devbind.py -b uio_pci_generic 01:00.0 
dpdk-devbind.py -b uio_pci_generic 02:00.0 
dpdk-devbind.py -b uio_pci_generic 05:00.0 
dpdk-devbind.py -b uio_pci_generic 06:00.0 
dpdk-devbind.py -b uio_pci_generic 07:00.0 
dpdk-devbind.py -b uio_pci_generic 08:00.0 

In the stanza above, we configure DPDK (required)

  • Configure HugePages

In this case we use 96 hugepages, this value can be different if you are using a box with different characteristics (# of ports, memory etc.) The objective is to configure a value that is not too high (waste of resources) and not too small. otherwise p4dpdk won't run. In this case this leaves 10 Free HugePages.

HugesPages Verification
grep HugePages_ /proc/meminfo
HugePages_Total:      96
HugePages_Free:       10
HugePages_Rsvd:        0
HugePages_Surp:        0 
  • Activate UIO_PCI_GENERIC driver
  • Bind the interfaces to DPDK, DPDK will control them now. Keep in mind that now they will be invisible from the linux kernel.

This command use device PCI ID. In order to check device PCI ID just issue the below command:

List PCI device ID list ready to be use (or not by DPDK)
 dpdk-devbind.py --status

Network devices using DPDK-compatible driver
============================================
0000:01:00.0 'I211 Gigabit Network Connection 1539' drv=uio_pci_generic unused=igb
0000:02:00.0 'I211 Gigabit Network Connection 1539' drv=uio_pci_generic unused=igb
0000:05:00.0 'I211 Gigabit Network Connection 1539' drv=uio_pci_generic unused=igb
0000:06:00.0 'I211 Gigabit Network Connection 1539' drv=uio_pci_generic unused=igb
0000:07:00.0 'I211 Gigabit Network Connection 1539' drv=uio_pci_generic unused=igb
0000:08:00.0 'I211 Gigabit Network Connection 1539' drv=uio_pci_generic unused=igb

Network devices using kernel driver
===================================
0000:09:00.0 'AR928X Wireless Network Adapter (PCI-Express) 002a' if=wlan0 drv=ath9k unused=uio_pci_generic 

No 'Baseband' devices detected
==============================

Other Crypto devices
====================
0000:00:1a.0 'Atom Processor Z36xxx/Z37xxx Series Trusted Execution Engine 0f18' unused=uio_pci_generic

No 'Eventdev' devices detected
==============================

No 'Mempool' devices detected
=============================

No 'Compress' devices detected
==============================

No 'Misc (rawdev)' devices detected
=================================== 
  • Configure the appliance OOBM via veth pair (as all physical ports are handled by DPDK and will be invisible from the Linux kernel)
Disable IPv6
#VETH for CPU_PORT and OOBM_PORT
ip link add veth0a type veth peer name veth0b

ip link set veth0a multicast on
ip link set veth0a allmulti on
ip link set veth0a promisc on
ip link set veth0a mtu 8192
ip link set veth0a up

ip link set veth0b multicast on
ip link set veth0b allmulti on
ip link set veth0b promisc on
ip link set veth0b mtu 8192
ip link set veth0b up

ethtool -K veth0a rx off
ethtool -K veth0a tx off
ethtool -K veth0a sg off
ethtool -K veth0a tso off
ethtool -K veth0a ufo off
ethtool -K veth0a gso off
ethtool -K veth0a gro off
ethtool -K veth0a lro off
ethtool -K veth0a rxvlan off
ethtool -K veth0a txvlan off
ethtool -K veth0a ntuple off
ethtool -K veth0a rxhash off
ethtool --set-eee veth0a eee off

ethtool -K veth0b rx off
ethtool -K veth0b tx off
ethtool -K veth0b sg off
ethtool -K veth0b tso off
ethtool -K veth0b ufo off
ethtool -K veth0b gso off
ethtool -K veth0b gro off
ethtool -K veth0b lro off
ethtool -K veth0b rxvlan off
ethtool -K veth0b txvlan off
ethtool -K veth0b ntuple off
ethtool -K veth0b rxhash off
ethtool --set-eee veth0b eee off

So the above section is pretty straightforward:

  • It creates veth0a / veth0b pair. For those familiar with P4, this is similar to the channel between the control plane (freeRouter) and p4dpdk (dataplane) using CPU_PORT
  • It sets for veth0a/veth0b: multicast/allmulti/promisc flag + mtu=8192
  • It disables TCP offload for veth0a/veth0b

We do the same thing for the Out Of Band management (linux access)

veth1a/veth1b for OOB management
ip link add veth1a type veth peer name veth1b

ip link set veth1a multicast on
ip link set veth1a allmulti on
ip link set veth1a promisc on
ip link set veth1a mtu 1500
ip link set veth1a up

ip link set veth1b multicast on
ip link set veth1b allmulti on
ip link set veth1b promisc on
ip link set veth1b mtu 8192
ip link set veth1b up

ip link set wlan0 up

ethtool -K veth1a rx off
ethtool -K veth1a tx off
ethtool -K veth1a sg off
ethtool -K veth1a tso off
ethtool -K veth1a ufo off
ethtool -K veth1a gso off
ethtool -K veth1a gro off
ethtool -K veth1a lro off
ethtool -K veth1a rxvlan off
ethtool -K veth1a txvlan off
ethtool -K veth1a ntuple off
ethtool -K veth1a rxhash off
ethtool --set-eee veth1a eee off

ethtool -K veth1b rx off
ethtool -K veth1b tx off
ethtool -K veth1b sg off
ethtool -K veth1b tso off
ethtool -K veth1b ufo off
ethtool -K veth1b gso off
ethtool -K veth1b gro off
ethtool -K veth1b lro off
ethtool -K veth1b rxvlan off
ethtool -K veth1b txvlan off
ethtool -K veth1b ntuple off
ethtool -K veth1b rxhash off
ethtool --set-eee veth1b eee off

ip addr flush dev veth1a 
ip addr add 192.168.128.254/24 dev veth1a

Add default route to SDN999 for OOBM return traffic (192.168.128.1 is freeRouter sdn999: we will see the full config later)

#ADD DEFAULT ROUTE to OOBM SDN999
route add default gw 192.168.128.1

Effectively start freeRouter main loop

Start freeRouter inside main loop
start-stop-daemon -S -b -x /rtr/hwdet-main.sh

This main loop is triggered by the script hwdet-main.sh below:

/rtr/hwdet-all.sh script
cat /rtr/hwdet-main.sh 

#!/bin/sh

while (true); do
  cd /rtr/
  stty raw < /dev/tty
  java -Xmx4g -jar /rtr/rtr.jar router /rtr/rtr-
  if [ $? -eq 4 ] ; then
    sync
    reboot -f
  fi
  stty cooked < /dev/tty
  sleep 1
done  

A bit of explanation

Requirement considerations:

  • The box should run 24x7
  • It must survive a power cut, i.e the service should be restored each time the power is cut for any reasons
  • If no power cut but freeRouter has crashed for any reason, it should be restarted

Let me re-assure you, freeRouter usually don't crash, most often freeRouter has manual or better: auto-upgrades (smile) 

freeRouter infinite loop: freeRouter autoupgrade process restarts and self-restarts
while (true); do
  ...
done  
  • The appliance has 8Gb RAM which is enough for JVM running freeRouter. (Full routing IPv4/IPv6 at the control plane is possible at home!  ← ok this is useless but cool, no? :3 )
    • RAM allocation is for JVM and its tables
    • Additional RAM allocation is for p4dpdk and p4emu, as we have to store the table once for the native code too
    • Lastly the kernel also needs memory, so it's a good idea to leave some free RAM and not give everything to JVM.
Start freeRouter
java -Xmx4g -jar /rtr/rtr.jar router /rtr/rtr-
  • freeRouter "Cold reboot"  
Cold reboot
if [ $? -eq 4 ] ; then
  sync
  reboot -f
fi

Discussion

All the choices have been made in order to make the appliance resilient as much as possible and provide an enjoyable user experience. We will see in a later article, a feature that I love: auto-upgrade. This will keep your appliance up to date over the network with the latest freeRouter train during low traffic period. Of course, for ISP P/PE core router we don't want this, but hey! why not? As soon as all customers are dual homed to 2 different PEs reachable via 2 direct core paths, this can be achieved during low traffic period after having set the metric to infinity on all the PE/P boxes to be upgraded. (use IS-IS overload bit or OSPF max-metric router-lsa)

Conclusion

In this article, we got our hands dirty and manually installed freeRouter with DPDK dataplane from a clean slate environment. This is done on purpose, as I'd like you to understand the whole installation process in detail. There is an automated installation alternative that will install freeRouter also. However this is will install freeRouter with software backend. If your hardware CPU+NIC is compatible you can just replace the software backend by DPDK backend. At that precise point we have a vanilla genuine installation of freeRouter with DPDK dataplane on an appliance that can survive physical wild environment and power cut. We have just now to create the 2 freeRouter configuration files:

freeRouter configuration files
ls -l rtr-*
-rw-r--r-- 1 root root  646 Jul 31 17:03 rtr-hw.txt
-rw-r--r-- 1 root root 9027 Aug 25 10:02 rtr-sw.txt


RARE validated design: [ SOHO #002 ] - key take-away

  • freeRouter installation is not complex. It just boils down to installing a basic supported Linux OS, install Java, some 3rd party software and the freeRouter jar and binaries itself
  • In the binary list you'll have a special one called p4dpdk that corresponds to freeRouter DPDK dataplane that emulate RARE P4 program on BMv2 (It does not emulate BMv2 !)
  • Though this installation is manual for pedagogic purpose, the installation can be fully automated, just fire up a VM with a bunch of interfaces and test it ! 
  • The installation proposed is highly resilient and will ease upgrade of the appliance (we will see in subsequent article what it means (wink) )

In the next article, we will configure the freeRouter appliance, start the router, and provide configuration in order to have effective basic ping reachability to the FTTH BROADBAND internal IP.

The "RARE/FreeRouter-101" series of articles is meant to help you quickly kickstart your very first RARE/freeRouter deployment and understand via a series of tutorials how it can be powered by various dataplanes. 101 article series also explained how RARE/freeRouter could be configured in order to be integrated into the external network environment. 101- [ #006 ] introduced an interesting solution for SOHO (small office/home office). You'll see in this "RARE validated design" series of articles,  an innovative implementation of a SOHO routing platform. These articles will draw your attention to an exceptional SOHO router with features usually implemented only by commercial solutions in service provider environments.

Requirement

  • Basic Linux/Unix knowledge
  • Service provider networking knowledge

Overview

Back in 2004, I deployed a 8Mbps ATM circuit that connected an airline company hub site. Traffic growth increased amazingly since then! In 2020, what does SOHO (Small Office, Home Office) mean nowadays? In our use case we will consider a SOHO connected at 1GE link. This is for example:

  • Primary schools, Secondary schools
  • Small R&E institution spoke sites
  • Home office (especially considering the COVID context)
  • Small company spoke agencies

Article objective

In this article we will describe how to build a carrier grade SOHO router (aka CPE) from an actual real platform. In this example let me share with you my personal story and introduce you my SOHO hardware that I'm using at home. It is compliant with the requirements implied by the use cases listed above:

Requirements

  • requirement #0: n×1GE capable, ISP uplink is 1GE 
  • requirement #1: completely silent, the box can be moved to crowded room
  • requirement #2: small power consumption, as it is meant to run 24x7. (I'm paying the bill ! (smile) )
  • requirement #3: Run 64-bit linux 
  • requirement #4: native support of DPDK

Diagrams

[ #001 ] - Cookbook

Hardware specification

  • 6× Intel 211AT Gigabit Ethernet, support wake up on LAN
  • Support 1× mSATA SSD, 1x DDR3L 1.35V memory 1333/1600MHz, max to 8GB;
  • 1× VGA max resolution 1920x1080P
  • 1× COM RJ45 console
  • Support add WiFi module ( Mini PCI-E half height size )
  • Support automatically power on after power restore.
  • Ultra compact measured at 180×175×34mm;
  • Low power requirements save money and be more eco-friendly.
  • Fanless, passive cooling, noise-less

CPU specification

  • CPU identifier: J1900
  • of cores: 4

  • # of Threads: 4

  • Processor Base Frequency: 2.00 GHz

  • Burst Frequency: 2.42 GHz

  • Cache: 2 MB L2 Cache

  • TDP: 10 W

freeRouter is heavily multithreaded, so for 4 cores is appreciated, as a budget SOHO router, VPN hardware NIC assistance is not required. If VPN concentrator is needed, we can deploy in a SOHO environment a dedicated box that has a CPU with AES-NI support. freeRouter won't run as a VM, so VT-x nor VT-d and VT-c is not required.  

SOHO usage

  • home office work
  • regular 720p/1080p/4K (and more) on-line VC via RENATER RENDEZ-VOUS or ZOOM
  • (intensive grown up kids) online gaming (2–3 persons can play an online game at the same time)
  • these kids+wife can multitask and watch 480p/780p Youtube video at the same times (This is the digital natives ...)
  • streaming video from MyCanal (French Netflix competitor)
  • Operating system/school educational material  parallel downloads
  • Intensive social network usage via native mobile client having integrated video in the apps ...

Bandwidth check

So all the above usage require a high amount of connectivity as all of the action above can occur in parallel. This is Speedtest test result during crowded working hours:

So my ISP was not totally lying after all, though I could not reach the theoretical 1GE that the ISP advertisement boasts. (wink)

SOHO comments

Please note that this hardware has no optical/SFP port. There are indeed similar configuration with 1 optical uplink port in case you are also the service provider in your environment. This hardware is specific to FTTH environment currently deployed in France.

Operating system specification

  • Debian 10 (aka Buster) 
  • netinstall is used
  • minimal vanilla installation

Requirements

  • requirement #0: LTS operating system 
  • requirement #1: Benefit from LTS security patches
  • requirement #2: Must be able to run dpdk
  • requirement #3: (personal requirement) Must be familiar to me
  • requirement #4: Able to run java software as freeRouter is written in Java
  • requirement #5: small operating system software footprint
  • requirement #6: Support for IPv4/IPv6

Additional nice to have features (but not used here as we are not using VM nor require high VPN traffic load)

  • Virtualisation support: Check CPU support for VT-x (intel) AMD-V (AMD) 
  • I/O MMU virtualisation (Kernel bypass mechanism): Check CPU support for VT-d AMD-Vi (AMD) needed by dpdk with VFIO driver in order to ensure hardware NIC packet forwarding
  • Network virtualisation: Check CPU support for VT-c  (SR-IOV)
  • Hardware Encryption: Check CPU support for AES-NI (Tunnel mechanism using AES such as OpenVPN, however this is useless for other tunnel type such as Wireguard

Discussion

Though the traffic distribution is totally different from a school or SOHO site traffic patterns, we can consider this hardware platform as a viable choice.

Platform considerations:

  • each 1GE port is wired to an Intel 211AT chipset. DPDK will take advantage of these chipset packet processing power burnt into the silicon in order to relieved the CPU load.
  • WIFI is not mandatory and the hardware included is not bleeding edge but considering the uplink bandwidth 802.11ax is not necessary. At least for Northbound traffic we are safe for the moment. At some points if East-West traffic such as NAS to wifi client require 10G traffic rate it will be the moment to buy a new appliance. If WIFI improvement is needed, 802.11ac card can be purchased with a 15€ budget. For WIFI client to WIFI client traffic 10GE traffic you can still purchase a 802.11ax mini PCIe card for around the same budget.

 freeRouter is supported on:

  • linux based system
  • android → yes, you can install freeRouter on your mobile phone and wander around your house, IPv4/IPv6 WIFI roaming will occur automagically!
  • freeRouter has a DPDK dataplane as well as a libpcap dataplane for older hardware
  • in this example I selected an appliance for convenient reasons but nothing prevents you from recycling an old laptop/desktop PC with multiple DPDK NICs. We can run a small PE (provider edge) router with multiple 1GE/10GE NICs. Note that the appliance can act as a 6x1GE provider edge router. This is the edge of the MPLS Seamless architecture.

Operating system future considerations:

  • In SP environment, the ideal situation is to have a custom Operating System (We are studying the Yocto project in order to create this custom OS)
  • This custom OS will encompasses the strict minimum software thus reducing the software footprint at its minimum
  • A very promising and unique features is also provided by: NixOS/Nix package manager : This will enable atomic commit/rollback at the package management level

The combination of Yocto + Nix can help develop your own specific DIY hardware (or for your company/organisation/institution) based on the popular concept that French ISPs love: "INTERNET BOX"

Conclusion

In this 1st article you:

  • had a brief description hardware platform suitable for SOHO
  • had a description of the SOHO use case in 2020
  • get a rationale on why this platform has been chosen
  • had a brief description of the selected Operating System
  • get a rationale on why this OS has been chosen

RARE validated design: [ SOHO #001 ] - key take-away

  • RARE/FreeRouter is a strong candidate for SOHO with multiple dataplane support solution.

If you are a company you run RARE/freeRouter with a versatile P4 switch such as APS Networks® BF2556X-1T or WEDGE, but as a SOHO with a small budget you can run it with a DPDK dataplane and for older hardware you still have the possibility run it with a pure software dataplane

  • RARE/freeRouter is the first element at the very edge of the MPLS seamless architecture

End to end MPLS is now possible for the Service provider at an affordable price

  • RARE/freeRouter design can coexist with Virtualisation technology

CPU extension such as VT-x/AMD-V, VT-D/AMD-Vi, VT-c can provide coexistence between RARE/freeRouter and a small amount of storage and compute node. (Such as micro-K8/docker)

In the next article we will start our journey in creating a carrier grade CPE using the platform above.

After having followed P4Lang P4 for dummies [ #002 ] article, you should have now a working P4 development environment.

Requirement

  • Basic Linux/Unix knowledge
  • Service provider networking knowledge

image2020-6-29_13-54-48.png

Overview

Let's start writing. compiling and running our first P4 program.

Article objective

This 3rd article propose to write your first P4 program based on P4Lang P4 for dummies [ #001 ]  my_program.p4 specification. 

Diagram: my_program.p4

[ #003 ] - Cookbook

P4 program specification

my_program.p4 packet progressing logic: "all packets arriving at port 4 are switched/forwarded to port 8"

  • In this example, the switch has 8 ports
  • A ingress packet arrives at port 4
  • the ingress port is then checked
  • If it is port 4, then the packet is switched to port 8
  • my_program.p4 does not implement a default condition, so all the packets not arriving on port 4 are then dropped
  • the ingress packets arrived with a header with charateristics set by the previous node
  • if needed, my_program.p4 is able to set modify the egress packet header for further processing by the next network node (example of in-band network Telemetry)

Let's first create the P4 program environment:

my_program.p4
mkdir -p ~/my_program/bin ~/my_program/p4src ~/my_program/p4rt_python ~/my_program/build  
Where
tree -d my_program/
my_program/         <------- top folder            
├── bfrt_python     <------- python/scapy folder containg tests scripts            
├── bin             <------- executable binary folder            
├── build           <------- containing p4 compilation artefacts results            
└── p4src           <------- containing p4 code
~/my_program/p4src/my_program.p4
/*
 * P4 language version: P4_16 
 */

/*
 * include P4 core library 
 */
#include <core.p4>

/* 
 * include P4 v1model library implemented by simple_switch 
 */
#include <v1model.p4>

#define PORT_4 4 
#define PORT_8 8 


/*
 * egress_spec port encoded using 9 bits
 */ 
typedef bit<9>  nexthop_id_t;

/*
 * metadata type  
 */
struct metadata_t {
   nexthop_id_t nexthop_id;
}

/*
 * Our P4 program header structure 
 */
struct headers {
}

/*
 * V1Model PARSER
 */
parser prs_main(packet_in packet,
                out headers hdr,
                inout metadata_t md,
                inout standard_metadata_t std_md) {

   state start {
      transition select(std_md.ingress_port) {
         PORT_4: prs_port_4;
         default: accept;
      }
   }

   state prs_port_4 {
      md.nexthop_id = PORT_8;
      transition accept;     
   }
}

/*
 * V1Model CHECKSUM VERIFICATION 
 */
control ctl_verify_checksum(inout headers hdr, inout metadata_t metadata) {
    apply {
  }
}


/*
 * V1Model INGRESS
 */
control ctl_ingress(inout headers hdr,
                  inout metadata_t md,
                  inout standard_metadata_t std_md) {

   apply {
      if (std_md.ingress_port == PORT_4) {
         std_md.egress_spec = md.nexthop_id;
      } 
   }
}


/*
 * V1Model EGRESS
 */

control ctl_egress(inout headers hdr,
                 inout metadata_t md,
                 inout standard_metadata_t std_md) {
   apply {
   }
}

/*
 * V1Model CHECKSUM COMPUTATION
 */
control ctl_compute_checksum(inout headers hdr, inout metadata_t md) {
   apply {
   }
}

/*
 * V1Model DEPARSER
 */
control ctl_deprs(packet_out packet, in headers hdr) {
    apply {
        /*
         * emit hdr
         */
        packet.emit(hdr);
    }
}


/*
 * V1Model P4 Switch define in v1model.p4
 */
V1Switch(
prs_main(),
ctl_verify_checksum(),
ctl_ingress(),
ctl_egress(),
ctl_compute_checksum(),
ctl_deprs()
) main;
Compilation of my_program.p4 using P4lang p4c
p4c --std p4-16 --target bmv2 --arch v1model -I ./include -o ./build --p4runtime-files ./build/my_program.json ./p4src/my_program.p4m

Verification

Compilation of my_program.p4 artefact in ./build
floui@ubi16:~/my_program$ ls -l build/
total 44
-rw-rw-r-- 1 floui floui  7532 Jul 24 14:23 my_program.json  <------ output used when launching bmv2
-rw-rw-r-- 1 floui floui 35462 Jul 24 14:23 my_program.p4ip  <------ other usage (not taken into account by the examplr)

Create veth pair before ...

Before launching our BMv2 virtual switch we need to create the veth pair that will be bound the P4 switch.

for that we will reuse bash scripts from Andy Fingerhut public GitHub Repository:

veth pairs setup
cd ~/my_program/bin
wget https://raw.githubusercontent.com/jafingerhut/p4-guide/master/bin/veth_setup.sh
wget https://raw.githubusercontent.com/jafingerhut/p4-guide/master/bin/veth_teardown.sh
chmod u+x ./veth_setup.sh
chmod u+x ./veth_teardown.sh
sudo ./veth_setup.sh

ip link | grep veth
4: veth1@veth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
6: veth3@veth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
7: veth2@veth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
8: veth5@veth4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
9: veth4@veth5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
10: veth7@veth6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
11: veth6@veth7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
12: veth9@veth8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
13: veth8@veth9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
14: veth11@veth10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
15: veth10@veth11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
16: veth13@veth12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
17: veth12@veth13: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
18: veth15@veth14: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
19: veth14@veth15: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
20: veth17@veth16: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
21: veth16@veth17: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9500 qdisc noqueue state UP mode DEFAULT group default qlen 1000

we can now launch BMv2 simple_switch and bind the 8 veth pairs we just configured

start bmv2 simple_switch (load my_program.json)
sudo simple_switch --log-console -i 1@veth2 -i 2@veth4 -i 3@veth6 -i 4@veth8 -i 5@veth10 -i 6@veth12 -i 7@veth14 -i 8@veth16 ./build/my_program.json
Calling target program-options parser
[14:28:41.364] [bmv2] [D] [thread 15917] Set default default entry for table 'tbl_my_program76': my_program76 - 
Adding interface veth2 as port 1
[14:28:41.364] [bmv2] [D] [thread 15917] Adding interface veth2 as port 1
Adding interface veth4 as port 2
[14:28:41.415] [bmv2] [D] [thread 15917] Adding interface veth4 as port 2
Adding interface veth6 as port 3
[14:28:41.455] [bmv2] [D] [thread 15917] Adding interface veth6 as port 3
Adding interface veth8 as port 4
[14:28:41.503] [bmv2] [D] [thread 15917] Adding interface veth8 as port 4
Adding interface veth10 as port 5
[14:28:41.547] [bmv2] [D] [thread 15917] Adding interface veth10 as port 5
Adding interface veth12 as port 6
[14:28:41.587] [bmv2] [D] [thread 15917] Adding interface veth12 as port 6
Adding interface veth14 as port 7
[14:28:41.635] [bmv2] [D] [thread 15917] Adding interface veth14 as port 7
Adding interface veth16 as port 8
[14:28:41.683] [bmv2] [D] [thread 15917] Adding interface veth16 as port 8
[14:28:41.727] [bmv2] [I] [thread 15917] Starting Thrift server on port 9090
[14:28:41.728] [bmv2] [I] [thread 15917] Thrift server was started
...
tcpdump veth8 (port 4)
sudo tcpdump -i veth8
...
tcpdump veth8 (port 8)
sudo tcpdump -i veth16
...

Now you need to find a way to:

  • send a packet to simple_switch@PORT_4 (veth8)
  • send another packet to simple_switch@PORT_1 (veth2)

We will use scapy for that:

scapy installation as root
pip3 install --pre scapy[complete]

Run scapy with sufficient privileges to send packets on specific interface

sudo scapy3
/usr/lib/python3/dist-packages/IPython/utils/module_paths.py:29: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
                                      
                     aSPY//YASa       
             apyyyyCY//////////YCa       |
            sY//////YSpcs  scpCY//Pp     | Welcome to Scapy
 ayp ayyyyyyySCP//Pp           syY//C    | Version 2.4.3~bionic
 AYAsAYYYYYYYY///Ps              cY//S   |
         pCCCCY//p          cSSps y//Y   | https://github.com/secdev/scapy
         SPPPP///a          pP///AC//Y   |
              A//A            cyP////C   | Have fun!
              p///Ac            sC///a   |
              P////YCpc           A//A   | Craft packets like I craft my beer.
       scccccp///pSP///p          p//Y   |               -- Jean De Clerck
      sY/////////y  caa           S//P   |
       cayCyayP//Ya              pY/Ya
        sY/PsY////YCc          aC//Yp 
         sc  sccaCY//PCypaapyCP//YSs  
                  spCPY//////YPSps    
                       ccaacs         
                                       using IPython 5.5.0
>>> 
From scapy prompt, send a packet to PORT_4 (veth8)
>>> sendp(IP(dst="1.2.3.4")/ICMP(),iface="veth8")
.
Sent 1 packets.
>>> 
Check tcpdump on veth8 (PORT_4)
floui@ubi16:~$ sudo tcpdump -i veth8
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on veth8, link-type EN10MB (Ethernet), capture size 262144 bytes
14:58:23.404299 00:00:40:01:9d:d2 (oui Unknown) > 45:00:00:1c:00:01 (oui Unknown), ethertype Unknown (0xc1e0), length 28: 
        0x0000:  1728 0102 0304 0800 f7ff 0000 0000       .(............ 
Check tcpdump on veth16 (PORT_8)
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on veth16, link-type EN10MB (Ethernet), capture size 262144 bytes
14:58:23.406042 00:00:40:01:9d:d2 (oui Unknown) > 45:00:00:1c:00:01 (oui Unknown), ethertype Unknown (0xc1e0), length 28: 
        0x0000:  1728 0102 0304 0800 f7ff 0000 0000       .(............ Conclusion

Congratulations !

You have successfully written, compiled, load your program P4Lang P4 virtual switch ! In addition, you also checked that the logic of your program is implemented correctly by sending a packet to PORT_4 using scapy python3 tool. You then checked with tcpdump that your packet ingressed the P4 switch via PORT_4 and egressed via PORT_8 as it was expected.

What's happening to other packets arriving on a port that is different from PORT_4 ?

Let's try to find out. In that situation, let's send an ingress packet to PORT_1 (veth2) of the switch and see what's happening.

From scapy prompt, send a packet to PORT_4 (veth8)
>>> sendp(IP(dst="1.2.3.4")/ICMP(),iface="veth2")
.
Sent 1 packets.
>>> 

In that case we don't know what is the egress port so let's look at simple_switch console.

simple_switch console
floui@ubi16:~/my_program$ sudo simple_switch --log-console -i 1@veth2 -i 2@veth4 -i 3@veth6 -i 4@veth8 -i 5@veth10 -i 6@veth12 -i 7@veth14 -i 8@veth16 ./build/my_program.json
Calling target program-options parser
[15:10:55.525] [bmv2] [D] [thread 16129] Set default default entry for table 'tbl_my_program76': my_program76 - 
Adding interface veth2 as port 1
[15:10:55.525] [bmv2] [D] [thread 16129] Adding interface veth2 as port 1
Adding interface veth4 as port 2
[15:10:55.555] [bmv2] [D] [thread 16129] Adding interface veth4 as port 2
Adding interface veth6 as port 3
[15:10:55.603] [bmv2] [D] [thread 16129] Adding interface veth6 as port 3
Adding interface veth8 as port 4
[15:10:55.651] [bmv2] [D] [thread 16129] Adding interface veth8 as port 4
Adding interface veth10 as port 5
[15:10:55.691] [bmv2] [D] [thread 16129] Adding interface veth10 as port 5
Adding interface veth12 as port 6
[15:10:55.739] [bmv2] [D] [thread 16129] Adding interface veth12 as port 6
Adding interface veth14 as port 7
[15:10:55.791] [bmv2] [D] [thread 16129] Adding interface veth14 as port 7
Adding interface veth16 as port 8
[15:10:55.839] [bmv2] [D] [thread 16129] Adding interface veth16 as port 8
[15:10:55.879] [bmv2] [I] [thread 16129] Starting Thrift server on port 9090
[15:10:55.880] [bmv2] [I] [thread 16129] Thrift server was started
[15:11:00.449] [bmv2] [D] [thread 16135] [0.0] [cxt 0] Processing packet received on port 1
[15:11:00.449] [bmv2] [D] [thread 16135] [0.0] [cxt 0] Parser 'parser': start
[15:11:00.449] [bmv2] [D] [thread 16135] [0.0] [cxt 0] Parser 'parser' entering state 'start'
[15:11:00.449] [bmv2] [D] [thread 16135] [0.0] [cxt 0] Parser state 'start': key is 0001
[15:11:00.449] [bmv2] [T] [thread 16135] [0.0] [cxt 0] Bytes parsed: 0
[15:11:00.449] [bmv2] [D] [thread 16135] [0.0] [cxt 0] Parser 'parser': end
[15:11:00.449] [bmv2] [D] [thread 16135] [0.0] [cxt 0] Pipeline 'ingress': start
[15:11:00.450] [bmv2] [T] [thread 16135] [0.0] [cxt 0] ./p4src/my_program.p4(75) Condition "std_md.ingress_port == 4" (node_2) is false
[15:11:00.450] [bmv2] [D] [thread 16135] [0.0] [cxt 0] Pipeline 'ingress': end
[15:11:00.450] [bmv2] [D] [thread 16135] [0.0] [cxt 0] Egress port is 0
[15:11:00.450] [bmv2] [D] [thread 16136] [0.0] [cxt 0] Pipeline 'egress': start
[15:11:00.450] [bmv2] [D] [thread 16136] [0.0] [cxt 0] Pipeline 'egress': end
[15:11:00.450] [bmv2] [D] [thread 16136] [0.0] [cxt 0] Deparser 'deparser': start
[15:11:00.450] [bmv2] [D] [thread 16136] [0.0] [cxt 0] Deparser 'deparser': end
[15:11:00.450] [bmv2] [D] [thread 16140] [0.0] [cxt 0] Transmitting packet of size 28 out of port 0

So in that case we see that line: "Egress port is 0", which is a special port number that designate the null0 interace. (packet dropped)

Let's now resent a packet to PORT_4 and observe simple_switch console log.

simple_switch console
sudo simple_switch --log-console -i 1@veth2 -i 2@veth4 -i 3@veth6 -i 4@veth8 -i 5@veth10 -i 6@veth12 -i 7@veth14 -i 8@veth16 ./build/my_program.json
Calling target program-options parser
[15:14:51.047] [bmv2] [D] [thread 16151] Set default default entry for table 'tbl_my_program76': my_program76 - 
Adding interface veth2 as port 1
[15:14:51.048] [bmv2] [D] [thread 16151] Adding interface veth2 as port 1
Adding interface veth4 as port 2
[15:14:51.099] [bmv2] [D] [thread 16151] Adding interface veth4 as port 2
Adding interface veth6 as port 3
[15:14:51.139] [bmv2] [D] [thread 16151] Adding interface veth6 as port 3
Adding interface veth8 as port 4
[15:14:51.175] [bmv2] [D] [thread 16151] Adding interface veth8 as port 4
Adding interface veth10 as port 5
[15:14:51.207] [bmv2] [D] [thread 16151] Adding interface veth10 as port 5
Adding interface veth12 as port 6
[15:14:51.239] [bmv2] [D] [thread 16151] Adding interface veth12 as port 6
Adding interface veth14 as port 7
[15:14:51.271] [bmv2] [D] [thread 16151] Adding interface veth14 as port 7
Adding interface veth16 as port 8
[15:14:51.319] [bmv2] [D] [thread 16151] Adding interface veth16 as port 8
[15:14:51.347] [bmv2] [I] [thread 16151] Starting Thrift server on port 9090
[15:14:51.348] [bmv2] [I] [thread 16151] Thrift server was started
[15:14:58.053] [bmv2] [D] [thread 16158] [0.0] [cxt 0] Processing packet received on port 4
[15:14:58.053] [bmv2] [D] [thread 16158] [0.0] [cxt 0] Parser 'parser': start
[15:14:58.053] [bmv2] [D] [thread 16158] [0.0] [cxt 0] Parser 'parser' entering state 'start'
[15:14:58.053] [bmv2] [D] [thread 16158] [0.0] [cxt 0] Parser state 'start': key is 0004
[15:14:58.053] [bmv2] [T] [thread 16158] [0.0] [cxt 0] Bytes parsed: 0
[15:14:58.053] [bmv2] [D] [thread 16158] [0.0] [cxt 0] Parser 'parser' entering state 'prs_port_4'
[15:14:58.053] [bmv2] [D] [thread 16158] [0.0] [cxt 0] Parser set: setting field 'scalars.userMetadata.nexthop_id' to 8
[15:14:58.053] [bmv2] [D] [thread 16158] [0.0] [cxt 0] Parser state 'prs_port_4' has no switch, going to default next state
[15:14:58.053] [bmv2] [T] [thread 16158] [0.0] [cxt 0] Bytes parsed: 0
[15:14:58.053] [bmv2] [D] [thread 16158] [0.0] [cxt 0] Parser 'parser': end
[15:14:58.053] [bmv2] [D] [thread 16158] [0.0] [cxt 0] Pipeline 'ingress': start
[15:14:58.054] [bmv2] [T] [thread 16158] [0.0] [cxt 0] ./p4src/my_program.p4(75) Condition "std_md.ingress_port == 4" (node_2) is true
[15:14:58.054] [bmv2] [T] [thread 16158] [0.0] [cxt 0] Applying table 'tbl_my_program76'
[15:14:58.054] [bmv2] [D] [thread 16158] [0.0] [cxt 0] Looking up key:

[15:14:58.054] [bmv2] [D] [thread 16158] [0.0] [cxt 0] Table 'tbl_my_program76': miss
[15:14:58.054] [bmv2] [D] [thread 16158] [0.0] [cxt 0] Action entry is my_program76 - 
[15:14:58.054] [bmv2] [T] [thread 16158] [0.0] [cxt 0] Action my_program76
[15:14:58.054] [bmv2] [T] [thread 16158] [0.0] [cxt 0] ./p4src/my_program.p4(76) Primitive std_md.egress_spec = md.nexthop_id
[15:14:58.054] [bmv2] [D] [thread 16158] [0.0] [cxt 0] Pipeline 'ingress': end
[15:14:58.054] [bmv2] [D] [thread 16158] [0.0] [cxt 0] Egress port is 8
[15:14:58.054] [bmv2] [D] [thread 16159] [0.0] [cxt 0] Pipeline 'egress': start
[15:14:58.054] [bmv2] [D] [thread 16159] [0.0] [cxt 0] Pipeline 'egress': end
[15:14:58.054] [bmv2] [D] [thread 16159] [0.0] [cxt 0] Deparser 'deparser': start
[15:14:58.054] [bmv2] [D] [thread 16159] [0.0] [cxt 0] Deparser 'deparser': end
[15:14:58.054] [bmv2] [D] [thread 16163] [0.0] [cxt 0] Transmitting packet of size 28 out of port 8

We clearly confirmed what tcpdump what putting in evidence: ingress PORT_4 leads to a packet switched to PORT_8

Conclusion

In this article you:

  • wrote your first P4 program
  • use p4c in order to compile it
  • learned how to instantiate virtual ethernet pair in order to bind them with simple_switch
  • launch simple_switch and load your program on it
  • set up a test environment using scapy
  • and verify your program using a combination a scapy and tcpdump

P4Lang P4 for dummy [ #002 ] - key take-away

  • my_program.p4 is written following V1Model definition that defines:
    • a parsing stage
    • a checksum verification stage
    • an ingress packet processing control stage
    • an egress packet processing control stage
    • a checksum computation stage
    • deparser stages
V1model PISA model
V1Switch( prs_main(), ctl_verify_checksum(), ctl_ingress(), ctl_egress(), ctl_compute_checksum(), ctl_deprs() ) main; 

It is described by the diagram below:

In a subsequent article we will dissect my_program.p4, but as you could observe, P4 programming is quite intuitive as it is all about switching a packet based on intrinsic ingress packet header and metadata (like packet ingress port) value.






In P4Lang P4 for dummies [ #001 ], you learned that behavioural language offers you access to dataplane programming. 

Requirement

  • Basic Linux/Unix knowledge
  • Service provider networking knowledge

image2020-6-29_13-54-48.png

Overview

In order to be able to start P4 programming, we will concretely start setting up a P4 development environment using Open Source P4Lang P4 community software. 

Article objective

This article exposes how to install:

  • P4Lang PI
  • P4Lang BMv2
  • P4Lang p4c

Operating system supported

  • Debian 10 (stable aka buster)
  • Ubuntu 18.04 (Bionic beaver)
  • Ubuntu 20.04 (Focal fossa)

Note

You can of course use the distribution of your choice as soon as the Operating System you are using has all the necessary third party dependencies required by P4Lang software, mainly:

  • protobuf
  • grpc
  • thrift
  • nanomsg
  • nnpy

You can find the full list here in Launchpad.

Diagram: 

[ #002 ] - Cookbook

In our example we will use the same debian stable image (buster) installed as a VirtualBox VM

and we add a bridge network interface to our laptop RJ45 connection.

add p4lang repository in /etc/apt/sources.list.d/p4.list
deb https://download.opensuse.org/repositories/home:/frederic-loui:/p4lang:/p4c:/master/Debian_10/ ./
add debian 10 repository key from download.opensuse.org
wget https://download.opensuse.org/repositories/home:/frederic-loui:/p4lang:/p4c:/master/Debian_10/Release.key
sudo apt-key add ./Release.key
install p4lang packages (just install p4c and it will install p4lang-pi and bmv2)
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install p4c

Note

Installing p4lang software with apt-get will download and install:

  • p4c
  • bmv2
  • p4lang-pi
add p4lang bionic 3rd party repository
sudo add-apt-repository ppa:frederic-loui/p4lang-3rd-party
sudo apt-get update
add p4lang bionic nightly build repository
sudo add-apt-repository ppa:frederic-loui/p4lang-master-bionic-nightly
sudo apt-get update
install p4lang packages (just install p4c and it will install p4lang-pi and bmv2)
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install p4c bmv2 plang-pi

Note

Installing p4lang software with apt-get will download and install:

  • p4lang-3rd-party (bionic)

alongside:

  • p4c
  • bmv2
  • p4lang-pi
add p4lang bionic 3rd party repository
sudo add-apt-repository ppa:frederic-loui/p4lang-3rd-party-focal
sudo apt-get update
add p4lang bionic nightly build repository
sudo add-apt-repository ppa:frederic-loui/p4lang-master-focal-nightly
sudo apt-get update
install p4lang packages (just install p4c and it will install p4lang-pi and bmv2)
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install p4c bmv2 plang-pi

Note

Installing p4lang software with apt-get will download and install:

  • p4lang-3rd-party (focal)

alongside:

  • p4c
  • bmv2
  • p4lang-pi

Verification

check p4lang packages installation on Debian
dpkg -l | grep p4lang
ii  bmv2                                   20200615~d447b6a~release~nightly-0+57.1 amd64        p4lang behavioral-model
ii  p4c                                    20200628~7c03f854~release~nightly-0     amd64        p4c p4lang project compiler
ii  p4lang-pi                              20200601~822a0d1~release~nightly-0+39.1 amd64        Implementation framework of a P4Runtime server
check p4lang packages installation on Ubuntu 18.04 (same for 20.04)
dpkg -l | grep p4lang
ii  bmv2                                     1.13.0-202006160902-d447b6a~ubuntu18.04.1    amd64        p4lang behavioral-model
ii  p4c                                      1.1.0-rc1-202006191103-3917a1c~ubuntu18.04.1 amd64        p4c p4lang project compiler
ii  p4lang-3rd-party                         1.1~bionic-1                                 all          This package installs 3rd party software needed by p4lang software
ii  p4lang-pi                                0.8-202006020517-822a0d1~ubuntu18.04.1       amd64        Implementation framework of a P4Runtime server
Clone RARE code from repository
cd ~/
git clone https://github.com/frederic-loui/RARE.git
compile RARE router.p4
cd ~/RARE/02-PE-labs/p4src
make build
mkdir -p ../build ../run/log
p4c --std p4-16 --target bmv2 --arch v1model \
        -I ./ -o ../build --p4runtime-files ../build/router.txt router.p4 
check RARE router.p4 compilation result:
ls -l ./build/
total 572
-rw-r--r-- 1 root root 448313 Jul 22 10:15 router.json
-rw-r--r-- 1 root root 100912 Jul 22 10:15 router.p4i
-rw-r--r-- 1 root root  32764 Jul 22 10:15 router.txt

Conclusion

In this article you learned how to set up a P4 environment development 

  • Debian 10
  • Ubuntu 18.04
  • Ubuntu 20.04

And tested the installation by compiling RARE P4 code.


P4Lang P4 for dummy [ #002 ] - key take-away

  • P4Lang P4 development environment creation is easy
  • it uses P4Lang packages on Debian and Ubuntu
  • These packages are maintained by RARE project and are nightly built based on P4Lang official GitHub

In the next article we will:

  • compile my_program.p4
  • launch P4Lang virtual switch called simple_switch and load my_program.p4 on it
  • perform basic verification





While the "RARE/FreeRouter-101" series teaches you how to start using RARE/freeRouter, in the "P4Lang P4 for dummies" article series, you'll learn how to start programming with the P4Lang P4 language. As a reminder, P4 dataplane is a type of dataplane that can be coupled to RARE/freeRouter as it is described in article 101-#003 and 101-#004. The final objective of this article series is to help you compile the VERY FIRST RARE/freeRouter test case that is covering:

  • basic control plane communication between freeRouter and BMv2/simple_switch_grpc
  • and the simple packet_in/packet_out header used in this interface communication.

Requirement

  • Basic Linux/Unix knowledge
  • Service provider networking knowledge

image2020-6-29_13-54-48.png

Overview

P4 is a language for programming the data plane of network devices. From p4.org web site:

«P4 is a domain-specific programming language for specifying the behaviour of the dataplanes of network-forwarding elements. »

Article objective

This 1st article exposes:

  • A brief introduction to the P4 language
  • A basic P4 development workflow
  • Some basic specificities of the P4 language

Note

This article is preliminary a pure introduction to P4lang P4. It does not correspond in any way to an extensive programming language description nor a P4 compilation guide.

Diagram: P4 development workflow

[ #001 ] - Cookbook: P4 development workflow

Based on what we mentioned, what does the "P4 Domain specific language" give you ? Concretely:

  1. You can write a program as you would in C or C++ but you'd have to follow the P4 language specification. (The current one is P4_16, there is also a previous P4_14 specification.)
  2. That program is compiled with a p4 compiler in P4_16 or P4_14 (similar to C++14/C++11) 
  3. The resulting compilation artifacts can be then loaded into an equipment implementing a P4 model commonly called a P4 target that is able to interpret/run p4 binaries. Here we will be using BMv2, a softwarized P4 target intended for learning.


Take away

The specificities are:

  • This P4 program is YOUR program
  • This P4 program allows you to define the YOUR OWN packet processing logic

In short, you can now program:

«how a packet that comes into your system, is processed and goes out your system»

Diagram packet processing description

The diagram above depicts 2 perspectives: 

  • P4 program development workflow
    • It starts by writing your P4 program using your favorite editor
    • compile your program with the P4 compiler of your target
    • load your program into the P4 target
  • my_program.p4 packet progressing logic: "all packets arriving at port 4 are switched/forwarded to port 8"
    • In this example, the switch has 8 ports
    • A ingress packet arrives at port 4
    • the ingress port is then checked
    • If it is port 4, then the packet is switched to port 8
    • my_program.p4 does not implement a default condition, so all the packets not arriving on port 4 are then dropped
    • the ingress packets arrived with a header with charateristics set by the previous node
    • if needed, my_program.p4 is able to set modify the egress packet header for further processing by the next network node (example of in-band network Telemetry)

Router for Academia Research & Education (RARE) & P4

The RARE project objective is to provide a networking solution to Research & Education institution use cases. While we witnessed the birth of several control plane such as GNU Zebra, Bird, exaBGP, etc. The common point of these softwares is that they don't have the capability (yet) to be coupled easily with a hardware dataplane. Simply put, these software control plane cannot be used without specific/important development in order to run on an equipment able to forward nx100GE links at a high Mpps rate. 

There have been attempts with DPDK and other kernel bypass mechanism, that enabled higher throughput processing capability, but this is not comparable to commercial/vendor equipment's packet processing power. 

P4:

  • opens you the door to software AND hardware dataplane programmability
  • gives you the possibility to implement YOUR own packet processing algorithm 

RARE control plane: freeRouter

In the RARE project, we are using a software control plane called freeRouter:

  • It is an open source control plane
  • It has been deployed since 2014 and benefits from hours of production in various environment
  • Interworking has been extensively and continuously tested with major equipment vendors
  • Last but not least freeRouter's maintainer is in the RARE team which allowed Rapid Application development and prototyping in order to build control plane and P4 dataplane communication.

P4 use cases are mostly inherently linked to the P4 target you plan to use in order to run your P4 program: 

A comprehensive list can be found here

  • P4Lang BMv2 V1Model target:

It is the P4Lang virtual model that emulates a PISA architecture. You can run it on a VM and start writing your first P4 program and load it on simple_switch and/or simple_switch_grpc (if you plan to use P4Runtime). While this is a great solution in order to learn P4 and sketch your packet processing algorithm, it is not recommended for production use.

  • INTEL/BAREFOOT TOFINO/TOFINO 2

This target also implements a PISA architecture and proposes a Virtual model so that you can validate your algorithm. However, once validated on the virtual model, you can load your program into a hardware switch that is running a NPU called TOFINO and its bigger brother TOFINO2. While TOFINO is able to handle 6.4 Tbps of traffic rate, TOFINO2 simply doubles this. (12 Tbps) In addition to that, TOFINO2 exposes additional inherent capabilities like bigger buffer, memory and TCAM compared to his little brother.

These are the use cases enabled by the combination of P4 and RARE software:

  • Service Provider core router:

You can build a robust packet switching fabric at the scale of Telecom Service Provider able to switch packets at n×100GE

  • Service Provider edge router: 

You can build an edge router an Interconnect the core router above. These routers will terminate your backbone network service like L2/L3 plain IP or VPN services (IPv4 - IPv6)  

  • Datacenter ToR switch 

With the WEDGE100BF32X you can have 2x100GE uplinks toward 2 distinct "leaf switches", it leaves you 30x100GE server connections.

  • Datacenter Spine/Leaf switch 

The WEDGE100BF32X is also a good candidate router in DC as a core/spine switch. You can create a fabric able to switch 6.4 Tbps trafic rates

  • Internet Exchange

In this case, the WEDGE100BF32X is 100GE a peer aggregator or simply integrated into the IXP distributed core fabric.

  • MAN/CPE router

The STORDIS BF2556X-1T with its flexible connectivity options is a good candidate for regional network implementation. It has 8x100GE ports, 2 of them can be used as uplinks toward their main transit provider, 2 other can be used to provide EAST/WEST connection via 2 different fire routes, this leave 4x100 ports in case you need to increase capacity. The STORDIS also has 16x[1/10/25] GbE ports, 32x[10/25] GbE ports which gives the possibility to interconnects users via various access port bandwidth.

Conclusion

In this article you:

  • had a brief introduction of P4Lang P4 language
  • had been presented a 10 thousand feet view of P4 development workflow
  • had been exposed a list of P4 targets and the use cases enabled by these targets

P4Lang P4 for dummy [ #001 ] - key take-away

__THE__ exciting INNOVATION provided by P4 boils down into this community language that unlocks and opens for you the door of system's dataplane. Till now, dataplane programming was reserved to commercial vendors. Some of these dataplanes like the well known CEF (Cisco Express Forwarding) are specific to Cisco equipment. Juniper, has its own dataplane (not sure about the name) implemented by Forwarding Plane component. (example of vMX architecture)    

P4 language inherent characteristics:

  • Behavioral programming language
  • Language with constraints 
  • Limited number of variable types
  • With fixed size
  • P4 is not a general purpose language, You cannot program any software. like C, C++ or Java

It is therefore a simple language, that is easier to be tamed by network managers rather than pure software developers. Indeed, writing a P4 program is all about defining the behavior of a network packet processing algorithm based on intrinsic variables encoded into the packet header.  




"RARE/FreeRouter-101" series of article are meant to help you quickly kickstart your RARE/freeRouter very first deployment and understand via a series of tutorial how it can be powered by various dataplane. 101 article series explained also how RARE/freeRouter could be configured in order to be integrated to the external network environment. However, even if 101- [ #006 ] is a robust and interesting solution for SOHO, you'll see in the "RARE validated design" series of articles,  a lot more interesting use case. This articles will draw your attention to mind blowing use cases that are usually implemented only by commercial solution in service provider environment.

Requirement

  • Basic Linux/Unix knowledge
  • Service provider networking knowledge

Overview

BGP is THE protocol of Internet, it is used to exchange routing information between other BGP systems between Internet domains. It comes in two flavours:

External BGP(eBGP): Network Layer Reachability Information (NLRI) is exchanged between network domain called Autonomous system usually administratively independant. We are speaking about BGP inter-domain routing. As an example, let's us assume a BGP speaker from AS2200 (RENATER) advertising NLRI information to AS20965 (GÉANT R&E). From that point AS20965 has the knowledge of how to reach any network advertised by AS2200 based on the NLRI information.

Internal BGP (iBGP): NLRI is propagated between BGP speakers inside the same domain. We are speaking about BGP intra-domain routing. As an example, assume border router AS2200 in Paris connected to GEANT network and get NLRI information from AS20965. I will then propagate this information internally and advertise GEANT NLRI information via iBGP session to other BGP speaker inside network domain for AS2200.

iBGP requires a full mesh network between all BGP speakers inside a domain because of an anti-AS loop avoidance. Thus requiring n*(n-1)/2 number of sessions to be implemented. BGP route reflection is a proposal that remove full mesh requirement. BGP Edge router has now only 1 BGP session toward the RR, thus reducing network equipment workload.

Article objective

In this article we will describe how to build a carrier grade route reflector cluster composed by RR1 and RR2. In order to reach Telecom Internet Service provider 99,999% of availability:

Let's consider the architecture network of a fictitious service provider below, router reflector RR1 and RR2 are dual homed to a core P routers.

Diagram

[ #001 ] - Cookbook

BGP RR main requirements

SR655 1 x EPYC 7302P, 64GB RAM, 2G CONTROLLER CACHE FLASH, 4x10G ports + SFP+ and 4x1G ports, 3 SSD 480GB MAINSTREAM, XCLARITY ENTERPRISE.

SR655 AMD EPYC 7302P (16C 2.8GHz 128MB Cache/155W) 32GB (2x32GB, 2Rx4 3200MHz RDIMM), No Backplane, SATA, 1x750W, Tooless Rails
ThinkSystem 2x32GB TruDDR4 3200MHz (2Rx4 1.2V) RDIMM-A
ThinkSystem SR655 2.5 SATA/SAS 8-Bay Backplane Kit
ThinkSystem RAID 930-8i 2GB Flash PCIe 12Gb Adapter
ThinkSystem 2.5 5300 480GB Mainstream SATA 6Gb Hot Swap SSD
ThinkSystem SR655 x16/x8/x8 PCIe Riser1 FH Kit
ThinkSystem SR635/SR655 x8 PCIe Internal Riser Kit
ThinkSystem Broadcom 57454 10/25GbE SFP28 4-port OCP Ethernet Adapter
ThinkSystem Broadcom 5720 1GbE RJ45 2-Port PCIe Ethernet Adapter
SFP+ SR Transceiver
ThinkSystem 750W(230/115V) Platinum Hot-Swap Power Supply
2.8m, 10A/100-250V, C13 to IEC 320-C14 Rack Power Cable
ThinkSystem Toolless Slide Rail Kit with 2U CMA
ThinkSystem SR655 Fan Option Kit
ThinkSystem SR635/SR655 Supercap Installation Kit

BGP RR main requirements

RR is a specific component inside a service provider environment:

  • The BGP RR is not in the data path inside the backbone, this can be adjusted by setting hight IGP metrics inside the code backbone. 
  • BGP traffic does not require a tremendous throughput so no need to have hardware NIC assisted forwarding mechanism such as dpdk.
  • A NREN route reflector with 2xIPv4 and 2xIPv6 full views coming from 2 upstream provider requires steady ~ 10 Mbps traffic rates, so we can assume that 10GE connection will be sufficient for the next decades all address-family included.
  • As of 2020/07/13, the Internet IPv4 routing table size is 839945 entries
  • As of 2020/07/13, the Internet IPv6 routing table size is 91062 entries

both cumulated with BGP other address families needs a constant usage of ~ 4GB of memory:

# show watchdog memory

  • So in the config above 64 Gbytes of RAM is sufficient in order to cache all the IPv4 and IPv6 routing table in memory (and also other BGP address family tables). It will be also largely enough in case of network instability, events that involves more CPU/memory usage related convergence computation.

Disclaimer

  • We have no incentive in proposing a server with the above brand. It just happen that this server was already bought and its configuration is matching perfectly the use case requirement but again, this is pure coincidence
  • 10GE port connection might be overkill, but in a Service Provider context this is the norm. It will avoid adjacent core routers to implement 1GE connectivity
  • PCIe GEN4 is available, and thus provide a tremendous amount of bandwidth for disk R/W operation. Though useful for the OS application, BGP RR setup won't take a direct advantage from PCIE GEN4.
  • Indeed in this configuration considering the amount of RAM we have we will disable SWAP operations.


BGP RR distinct data path

  • Connect the server with 2 NIC using optical  SFP ( Broadcom 57454 10/25GbE SFP28 4-port OCP Ethernet Adapter) to core backbone routers following distinct dark fiber path.
  • The link between C1 - C2 provides an additional level of redundancy

BGP RR out of band management

  • Connect the server with 1 NIC using RJ45 (Broadcom 5720 1GbE RJ45 2-Port PCIe Ethernet Adapter) to the KVM or Out fo band management network

Do not forget ...

One point overlooked is the environment. As said BGP is a central component in service provider network. It must be deployed considering the following recommendations:

  • Deploy an RR in carrier hotel
  • With sufficient cooling
  • With sufficient power. Make also sure to have redundant power and use dual PSU connected to different energy source
  • Rack properly the server and make sure it is installed without blocking airflow as per server vendor advice


Install OS supported in your company

  • Use only stable branch also called LTS operating system like Debian 10 or Ubuntu 18.04 and ubuntu 20.04
  • Apply your IT strip down security patch and make it enter your server maintenance process
  • In our case we will use Debian 10

BGP RR Life cycle management

It is important to note that now, BGP RR is subject to your company server hardware maintenance and that the software is not part of it.

  • Server hardware maintenance is now applied to a network equipment
  • The software is maintained by freeRouter project members
mkdir -p ~/freeRouter/bin ~/freeRouter/lib ~/freeRouter/etc ~/freeRouter/log
cd ~/freeRouter/lib
wget http://freerouter.nop.hu/rtr.jar
Update & Upgrade system
╭─[11:11:54]floui@debian ~ 
╰─➤ tree freeRouter
freeRouter
├── bin   # binary files      
├── etc   # configuration files      
├── lib   # library files      
└── log   # log files      

get freeRouter net-tools tarball
wget freerouter.nop.hu/rtr.tar
Install build tools
tar xvf rtr.tar -C ~/freeRouter/bin/

For those you would like to rebuild these binaries you can find the compilation shell script in freeRouter cloned git repository in: ~/freeRouter/src/native/c.sh

No throughput required

  • In this case simple pcapInt packet forwarding is recommended
  • In this setup all freeRouter functionalities are natively available
  • freeRouter heavily uses the concept of thread, hence 16 CPU cores will be fully exploited 

freeRouter upgrade

freeRouter upgrades involves 3 aspects:

  • It is pretty unusual, but as freeRouter is using Java, you have to follow Java software update recommandation 
  • freeRouter control plane software it self, it is essentiallaly a rtr.jar file that has to be replaced by the latest version
  • freeRouter dataplane software pcapInt upgrade. pcapInt upgrade are unusual but still has to be checked in freeRouter release notes

We are (at last) now ready to configure freeRouter as a BGP route reflector !

FreeRouter uses 2 configuration files in order to run, let's write these configuration files for R1 in ~/freeRouter/etc

freeRouter hardware file: bgp-rr-freerouter-hw.txt
int eth1 eth 0000.1111.0001 127.0.0.1 10011 127.0.0.1 10012
int eth2 eth 0000.2222.0002 127.0.0.1 10021 127.0.0.1 10022
tcp2vrf 2323 v1 23

BGP RR interfaces

  • eth1 is BGP port eth1, port 10011 is freeRouter port while 10012 is the port associated to pcapInt associated in linux interface in NIC #1 
  • eth2 is BGP port eth2,  port 10021 is freeRouter port while 10022 is the port associated to pcapInt associated in linux interface in NIC #2
  • For now freeRouter will be accessible only via telnet session on port 2323 
freeRouter software configuration file: r1-sw.txt
hostname rr1
buggy
!
!
access-list ACL-IPv4-RR-CLIENT
 sequence 10 permit all 1.1.1.1 255.255.255.255 all any all
 sequence 20 permit all 2.2.2.2 255.255.255.255 all any all
 sequence 30 permit all 3.3.3.3 255.255.255.255 all any all
 sequence 40 permit all 4.4.4.4 255.255.255.255 all any all
 sequence 50 permit all 5.5.5.5 255.255.255.255 all any all
 sequence 60 permit all 6.6.6.6 255.255.255.255 all any all
 sequence 70 permit all 7.7.7.7 255.255.255.255 all any all
 sequence 80 permit all 8.8.8.8 255.255.255.255 all any all
 exit
!
access-list ACL-IPv6-RR-CLIENT
 sequence 10 deny all fd00::a ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff all any all
 sequence 20 deny all fd00::b ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff all any all
 sequence 30 permit all fd00:: ffff:: all any all
 exit
!
prefix-list PFX-IPv4-NHT
 sequence 10 permit 1.1.1.1/32 ge 32 le 32
 sequence 20 permit 2.2.2.2/32 ge 32 le 32
 sequence 30 permit 3.3.3.3/32 ge 32 le 32
 sequence 40 permit 4.4.4.4/32 ge 32 le 32
 sequence 50 permit 5.5.5.5/32 ge 32 le 32
 sequence 60 permit 6.6.6.6/32 ge 32 le 32
 sequence 70 permit 7.7.7.7/32 ge 32 le 32
 sequence 80 permit 8.8.8.8/32 ge 32 le 32
 sequence 100 permit 10.10.10.10/32 ge 32 le 32
 sequence 110 permit 11.11.11.11/32 ge 32 le 32
 exit
!
prefix-list PFX-IPv6-NHT
 sequence 10 permit fd00::/32 ge 128 le 128
 exit
!
route-policy NHT
 sequence 10 if distance 110
 sequence 20   pass
 sequence 30 else
 sequence 40   drop
 sequence 50 enif
 exit
!
vrf definition v1
 rd 1:1
 exit
!
router ospf4 1
 vrf v1
 router-id 4.4.4.10
 traffeng-id 0.0.0.0
 area 0 enable
 redistribute connected
 exit
!
router ospf6 1
 vrf v1
 router-id 6.6.6.10
 traffeng-id ::
 area 0 enable
 redistribute connected
 exit
!
interface loopback1
 no description
 vrf forwarding v1
 ipv4 address 10.10.10.10 255.255.255.255
 ipv6 address fd00::a ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff
 router ospf4 1 enable
 router ospf4 1 area 0
 router ospf4 1 passive
 router ospf6 1 enable
 router ospf6 1 area 0
 router ospf6 1 passive
 no shutdown
 no log-link-change
 exit
!
interface ethernet1
 no description
 vrf forwarding v1
 ipv4 address 10.1.10.10 255.255.255.0
 ipv6 address fd00:cafe::1:10:10 ffff:ffff:ffff:ffff:ffff:ffff:ffff::
 router ospf4 1 enable
 router ospf4 1 area 0
 router ospf4 1 cost 4444
 router ospf6 1 enable
 router ospf6 1 area 0
 router ospf6 1 cost 6666
 no shutdown
 no log-link-change
 exit
!
interface ethernet2
 no description
 vrf forwarding v1
 ipv4 address 10.4.10.10 255.255.255.0
 ipv6 address fd00:cafe::4:10:10 ffff:ffff:ffff:ffff:ffff:ffff:ffff::
 router ospf4 1 enable
 router ospf4 1 area 0
 router ospf4 1 cost 4444
 router ospf6 1 enable
 router ospf6 1 area 0
 router ospf6 1 cost 6666
 no shutdown
 no log-link-change
 exit
!
router bgp4 65535
 vrf v1
 local-as 65535
 router-id 10.10.10.10
 address-family unicast multicast other flowspec vpnuni vpnmlt vpnflw ovpnuni ovpnmlt ovpnflw vpls mspw evpn mdt srte mvpn omvpn
 nexthop route-policy NHT
 nexthop prefix-list PFX-IPv4-NHT
 template bgp4 remote-as 65535
 template bgp4 description rr clients
 template bgp4 local-as 65535
 template bgp4 address-family unicast multicast other flowspec vpnuni vpnmlt vpnflw ovpnuni ovpnmlt ovpnflw vpls mspw evpn mdt srte mvpn omvpn
 template bgp4 distance 255
 template bgp4 connection-mode active
 template bgp4 compression both
 template bgp4 update-source loopback1
 template bgp4 hostname
 template bgp4 aigp
 template bgp4 traffeng
 template bgp4 pmsitun
 template bgp4 tunenc
 template bgp4 attribset
 template bgp4 segrout
 template bgp4 bier
 template bgp4 route-reflector-client
 template bgp4 next-hop-unchanged
 template bgp4 send-community all
 listen ACL-IPv4-RR-CLIENT bgp4
 exit
!
router bgp6 65535
 vrf v1
 local-as 65535
 router-id 10.10.10.10
 address-family unicast multicast other flowspec vpnuni vpnmlt vpnflw ovpnuni ovpnmlt ovpnflw vpls mspw evpn mdt srte mvpn omvpn
 nexthop route-policy NHT
 nexthop prefix-list PFX-IPv6-NHT
 template bgp6 remote-as 65535
 template bgp6 description rr clients
 template bgp6 local-as 65535
 template bgp6 address-family unicast multicast other flowspec vpnuni vpnmlt vpnflw ovpnuni ovpnmlt ovpnflw vpls mspw evpn mdt srte mvpn omvpn
 template bgp6 distance 255
 template bgp6 connection-mode active
 template bgp6 compression both
 template bgp6 update-source loopback1
 template bgp6 hostname
 template bgp6 aigp
 template bgp6 traffeng
 template bgp6 pmsitun
 template bgp6 tunenc
 template bgp6 attribset
 template bgp6 segrout
 template bgp6 bier
 template bgp6 route-reflector-client
 template bgp6 next-hop-unchanged
 template bgp6 send-community all
 listen ACL-IPv6-RR-CLIENT bgp6
 exit
!
!
!
!
!
!
!
!
!
!
!
!
!
!
server telnet tel
 security protocol telnet
 no exec authorization
 no login authentication
 vrf v1
 exit
!
!
end
freeRouter launch with supplied rr1-hw.txt and rr1-sw.txt with a console prompt
╭─[6:06:13]floui@debian ~/freeRouter  
╰─➤  java -jar lib/rtr.jar routersc etc/rr1-hw.txt etc/rr1-sw.txt                                                                                      
info cfg.cfgInit.doInit:cfgInit.java:556 booting
info cfg.cfgInit.doInit:cfgInit.java:680 initializing hardware
info cfg.cfgInit.doInit:cfgInit.java:687 applying defaults
info cfg.cfgInit.doInit:cfgInit.java:695 applying configuration
info cfg.cfgInit.doInit:cfgInit.java:721 done
welcome
line ready
rr1#                   
Launch pcapInt in order to bind socket for both interface enp0s9
╭─[6:06:13]floui@debian[1]  ~/freeRouter/bin  
╰─➤  sudo ./pcapInt.bin enp0s9 10012 127.0.0.1 10011 127.0.0.1                                                                                                       
binded to local port 127.0.0.1 10012.
will send to 127.0.0.1 10011.
pcap version: libpcap version 1.8.1
opening interface enp0s9 with pcap1.x api
serving others
> 
Launch pcapInt in order to bind socket for both interface enp0s10
╭─[6:06:13]floui@debian[1]  ~/freeRouter/bin  
╰─➤  sudo ./pcapInt.bin enp0s10 10022 127.0.0.1 10021 127.0.0.1                                                                                                      
binded to local port 127.0.0.1 10022.
will send to 127.0.0.1 10021.
pcap version: libpcap version 1.8.1
opening interface enp0s10 with pcap1.x api
serving others
> 

Verification

rr1 telnet access via port 10010
╭─[1:09:28]floui@debian ~  
╰─➤  telnet localhost 10010
Trying ::1...
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
welcome
line ready
rr1#                   
From rr1 perspective:
rr1# sh ipv4 route v1                                                          
typ  prefix          metric    iface      hop        time
O    1.1.1.1/32      110/4444  ethernet1  10.1.10.1  00:05:05
O    2.2.2.2/32      110/4445  ethernet1  10.1.10.1  00:04:50
O    3.3.3.3/32      110/4445  ethernet2  10.4.10.4  00:04:32
O    4.4.4.4/32      110/4444  ethernet2  10.4.10.4  00:04:18
O    5.5.5.5/32      110/4445  ethernet1  10.1.10.1  00:04:00
O    6.6.6.6/32      110/4445  ethernet1  10.1.10.1  00:03:42
O    7.7.7.7/32      110/4446  ethernet1  10.1.10.1  00:03:28
O    8.8.8.8/32      110/4445  ethernet2  10.4.10.4  00:02:59
O    10.1.2.0/24     110/4444  ethernet1  10.1.10.1  00:22:47
O    10.1.4.0/24     110/4444  ethernet2  10.4.10.4  00:22:47
O    10.1.5.0/24     110/4444  ethernet1  10.1.10.1  00:22:47
O    10.1.6.0/24     110/4444  ethernet1  10.1.10.1  00:22:47
C    10.1.10.0/24    0/0       ethernet1  null       00:22:49
LOC  10.1.10.10/32   0/1       ethernet1  null       00:22:49
O    10.2.3.0/24     110/4445  ethernet2  10.4.10.4  00:22:35
O    10.2.6.0/24     110/4445  ethernet1  10.1.10.1  00:22:47
O    10.2.7.0/24     110/4445  ethernet1  10.1.10.1  00:22:38
O    10.2.11.0/24    110/4445  ethernet1  10.1.10.1  00:22:38
O    10.3.4.0/24     110/4444  ethernet2  10.4.10.4  00:22:47
O    10.3.7.0/24     110/4445  ethernet2  10.4.10.4  00:22:35
O    10.3.8.0/24     110/4445  ethernet2  10.4.10.4  00:22:32
O    10.3.11.0/24    110/4445  ethernet2  10.4.10.4  00:22:35
O    10.4.5.0/24     110/4444  ethernet2  10.4.10.4  00:22:47
O    10.4.8.0/24     110/4444  ethernet2  10.4.10.4  00:22:47
C    10.4.10.0/24    0/0       ethernet2  null       00:22:49
LOC  10.4.10.10/32   0/1       ethernet2  null       00:22:49
C    10.10.10.10/32  0/0       loopback1  null       00:22:49
O    11.11.11.11/32  110/8889  ethernet1  10.1.10.1  00:06:43

rr1# sh ipv4 ospf 1 topo 0                                                     
node      reach  via        ifc        met   hop  conn  sr  br  neighbors
4.4.4.1   true   10.1.10.1  ethernet1  4444  1    5     0   0   4.4.4.2=1=10.1.2.1 4.4.4.4=1=10.1.4.1 4.4.4.5=1=10.1.5.1 4.4.4.6=1=10.1.6.1 4.4.4.10=4444=10.1.10.1
4.4.4.2   true   10.1.10.1  ethernet1  4445  2    5     0   0   4.4.4.1=1=10.1.2.2 4.4.4.3=1=10.2.3.2 4.4.4.7=1=10.2.7.2 4.4.4.6=1=10.2.6.2 4.4.4.11=4444=10.2.11.2
4.4.4.3   true   10.4.10.4  ethernet2  4445  2    4     0   0   4.4.4.2=1=10.2.3.3 4.4.4.4=1=10.3.4.3 4.4.4.8=1=10.3.8.3 4.4.4.7=1=10.3.7.3
4.4.4.4   true   10.4.10.4  ethernet2  4444  1    5     0   0   4.4.4.3=1=10.3.4.4 4.4.4.8=1=10.4.8.4 4.4.4.5=1=10.4.5.4 4.4.4.1=1=10.1.4.4 4.4.4.10=4444=10.4.10.4
4.4.4.5   true   10.1.10.1  ethernet1  4445  2    2     0   0   4.4.4.1=1=10.1.5.5 4.4.4.4=1=10.4.5.5
4.4.4.6   true   10.1.10.1  ethernet1  4445  2    2     0   0   4.4.4.1=1=10.1.6.6 4.4.4.2=1=10.2.6.6
4.4.4.7   true   10.1.10.1  ethernet1  4446  3    2     0   0   4.4.4.2=1=10.2.7.7 4.4.4.3=1=10.3.7.7
4.4.4.8   true   10.4.10.4  ethernet2  4445  2    2     0   0   4.4.4.3=1=10.3.8.8 4.4.4.4=1=10.4.8.8
4.4.4.10  true   null       null       0     0    2     0   0   4.4.4.1=4444=10.1.10.10 4.4.4.4=4444=10.4.10.10
4.4.4.11  true   10.1.10.1  ethernet1  8889  3    1     0   0   4.4.4.2=4444=10.2.11.11

rr1# sh ipv6 route v1                                                          
typ  prefix                  metric     iface      hop                time
O    fd00::1/128             110/6666   ethernet1  fd00:cafe::1:10:1  00:06:01
O    fd00::2/128             110/6667   ethernet1  fd00:cafe::1:10:1  00:05:46
O    fd00::3/128             110/6667   ethernet2  fd00:cafe::4:10:4  00:05:28
O    fd00::4/128             110/6666   ethernet2  fd00:cafe::4:10:4  00:05:14
O    fd00::5/128             110/6667   ethernet1  fd00:cafe::1:10:1  00:04:56
O    fd00::6/128             110/6667   ethernet1  fd00:cafe::1:10:1  00:04:38
O    fd00::7/128             110/6668   ethernet1  fd00:cafe::1:10:1  00:04:24
O    fd00::8/128             110/6667   ethernet2  fd00:cafe::4:10:4  00:03:56
C    fd00::a/128             0/0        loopback1  null               00:23:45
O    fd00::b/128             110/13333  ethernet1  fd00:cafe::1:10:1  00:07:40
O    fd00:cafe::1:2:0/112    110/6666   ethernet1  fd00:cafe::1:10:1  00:23:43
O    fd00:cafe::1:4:0/112    110/6666   ethernet2  fd00:cafe::4:10:4  00:23:43
O    fd00:cafe::1:5:0/112    110/6666   ethernet1  fd00:cafe::1:10:1  00:23:43
O    fd00:cafe::1:6:0/112    110/6666   ethernet1  fd00:cafe::1:10:1  00:23:43
C    fd00:cafe::1:10:0/112   0/0        ethernet1  null               00:23:45
LOC  fd00:cafe::1:10:10/128  0/1        ethernet1  null               00:23:45
O    fd00:cafe::2:3:0/112    110/6667   ethernet1  fd00:cafe::1:10:1  00:23:32
O    fd00:cafe::2:6:0/112    110/6667   ethernet1  fd00:cafe::1:10:1  00:23:32
O    fd00:cafe::2:7:0/112    110/6667   ethernet1  fd00:cafe::1:10:1  00:23:32
O    fd00:cafe::2:11:0/112   110/6667   ethernet1  fd00:cafe::1:10:1  00:23:32
O    fd00:cafe::3:4:0/112    110/6666   ethernet2  fd00:cafe::4:10:4  00:23:43
O    fd00:cafe::3:7:0/112    110/6667   ethernet2  fd00:cafe::4:10:4  00:23:32
O    fd00:cafe::3:8:0/112    110/6667   ethernet2  fd00:cafe::4:10:4  00:23:32
O    fd00:cafe::3:11:0/112   110/6667   ethernet2  fd00:cafe::4:10:4  00:23:32
O    fd00:cafe::4:5:0/112    110/6666   ethernet2  fd00:cafe::4:10:4  00:23:43
O    fd00:cafe::4:8:0/112    110/6666   ethernet2  fd00:cafe::4:10:4  00:23:43
C    fd00:cafe::4:10:0/112   0/0        ethernet2  null               00:23:45
LOC  fd00:cafe::4:10:10/128  0/1        ethernet2  null               00:23:45

rr1# sh ipv6 ospf 1 topo 0                                                     
node               reach  via                ifc        met    hop  conn  sr  br  neighbors
6.6.6.1/00000000   true   fd00:cafe::1:10:1  ethernet1  6666   1    5     0   0   6.6.6.2/00000000=1=10012 6.6.6.4/00000000=1=10015 6.6.6.5/00000000=1=10012 6.6.6.6/00000000=1=10012 6.6.6.10/00000000=6666=10012
6.6.6.2/00000000   true   fd00:cafe::1:10:1  ethernet1  6667   2    5     0   0   6.6.6.1/00000000=1=10012 6.6.6.3/00000000=1=10012 6.6.6.7/00000000=1=10012 6.6.6.6/00000000=1=10013 6.6.6.11/00000000=6666=10012
6.6.6.3/00000000   true   fd00:cafe::4:10:4  ethernet2  6667   2    4     0   0   6.6.6.2/00000000=1=10013 6.6.6.4/00000000=1=10012 6.6.6.8/00000000=1=10012 6.6.6.7/00000000=1=10013
6.6.6.4/00000000   true   fd00:cafe::4:10:4  ethernet2  6666   1    5     0   0   6.6.6.3/00000000=1=10013 6.6.6.8/00000000=1=10013 6.6.6.5/00000000=1=10013 6.6.6.1/00000000=1=10013 6.6.6.10/00000000=6666=10013
6.6.6.5/00000000   true   fd00:cafe::1:10:1  ethernet1  6667   2    2     0   0   6.6.6.1/00000000=1=10014 6.6.6.4/00000000=1=10014
6.6.6.6/00000000   true   fd00:cafe::1:10:1  ethernet1  6667   2    2     0   0   6.6.6.1/00000000=1=10015 6.6.6.2/00000000=1=10015
6.6.6.7/00000000   true   fd00:cafe::1:10:1  ethernet1  6668   3    2     0   0   6.6.6.2/00000000=1=10014 6.6.6.3/00000000=1=10015
6.6.6.8/00000000   true   fd00:cafe::4:10:4  ethernet2  6667   2    2     0   0   6.6.6.3/00000000=1=10014 6.6.6.4/00000000=1=10013
6.6.6.10/00000000  true   null               null       0      0    2     0   0   6.6.6.1/00000000=6666=10016 6.6.6.4/00000000=6666=10016
6.6.6.11/00000000  true   fd00:cafe::1:10:1  ethernet1  13333  3    1     0   0   6.6.6.2/00000000=6666=10016
Check reachability from one RR client (c5 for example)
c5#sh ipv4 route v1                                                            
typ  prefix          metric    iface      hop       time
O    1.1.1.1/32      110/1     ethernet1  10.1.5.1  00:07:22
O    2.2.2.2/32      110/2     ethernet1  10.1.5.1  00:07:07
O    3.3.3.3/32      110/2     ethernet2  10.4.5.4  00:06:49
O    4.4.4.4/32      110/1     ethernet2  10.4.5.4  00:06:35
C    5.5.5.5/32      0/0       loopback1  null      00:25:07
O    6.6.6.6/32      110/2     ethernet1  10.1.5.1  00:06:00
O    7.7.7.7/32      110/3     ethernet1  10.1.5.1  00:05:46
O    8.8.8.8/32      110/2     ethernet2  10.4.5.4  00:05:17
O    10.1.2.0/24     110/1     ethernet1  10.1.5.1  00:25:06
O    10.1.4.0/24     110/1     ethernet2  10.4.5.4  00:25:05
C    10.1.5.0/24     0/0       ethernet1  null      00:25:07
LOC  10.1.5.5/32     0/1       ethernet1  null      00:25:07
O    10.1.6.0/24     110/1     ethernet1  10.1.5.1  00:25:06
O    10.1.10.0/24    110/1     ethernet1  10.1.5.1  00:25:06
O    10.2.3.0/24     110/2     ethernet2  10.4.5.4  00:24:53
O    10.2.6.0/24     110/2     ethernet1  10.1.5.1  00:25:05
O    10.2.7.0/24     110/2     ethernet1  10.1.5.1  00:24:56
O    10.2.11.0/24    110/2     ethernet1  10.1.5.1  00:24:56
O    10.3.4.0/24     110/1     ethernet2  10.4.5.4  00:25:05
O    10.3.7.0/24     110/2     ethernet2  10.4.5.4  00:24:53
O    10.3.8.0/24     110/2     ethernet2  10.4.5.4  00:24:50
O    10.3.11.0/24    110/2     ethernet2  10.4.5.4  00:24:53
C    10.4.5.0/24     0/0       ethernet2  null      00:25:07
LOC  10.4.5.5/32     0/1       ethernet2  null      00:25:07
O    10.4.8.0/24     110/1     ethernet2  10.4.5.4  00:25:05
O    10.4.10.0/24    110/1     ethernet2  10.4.5.4  00:25:05
O    10.10.10.10/32  110/4445  ethernet1  10.1.5.1  00:11:05
O    11.11.11.11/32  110/4446  ethernet1  10.1.5.1  00:09:01

c5#sh ipv4 ospf 1 topo 0                                                       
node      reach  via       ifc        met   hop  conn  sr  br  neighbors
4.4.4.1   true   10.1.5.1  ethernet1  1     1    5     0   0   4.4.4.2=1=10.1.2.1 4.4.4.4=1=10.1.4.1 4.4.4.5=1=10.1.5.1 4.4.4.6=1=10.1.6.1 4.4.4.10=4444=10.1.10.1
4.4.4.2   true   10.1.5.1  ethernet1  2     2    5     0   0   4.4.4.1=1=10.1.2.2 4.4.4.3=1=10.2.3.2 4.4.4.7=1=10.2.7.2 4.4.4.6=1=10.2.6.2 4.4.4.11=4444=10.2.11.2
4.4.4.3   true   10.4.5.4  ethernet2  2     2    4     0   0   4.4.4.2=1=10.2.3.3 4.4.4.4=1=10.3.4.3 4.4.4.8=1=10.3.8.3 4.4.4.7=1=10.3.7.3
4.4.4.4   true   10.4.5.4  ethernet2  1     1    5     0   0   4.4.4.3=1=10.3.4.4 4.4.4.8=1=10.4.8.4 4.4.4.5=1=10.4.5.4 4.4.4.1=1=10.1.4.4 4.4.4.10=4444=10.4.10.4
4.4.4.5   true   null      null       0     0    2     0   0   4.4.4.1=1=10.1.5.5 4.4.4.4=1=10.4.5.5
4.4.4.6   true   10.1.5.1  ethernet1  2     2    2     0   0   4.4.4.1=1=10.1.6.6 4.4.4.2=1=10.2.6.6
4.4.4.7   true   10.1.5.1  ethernet1  3     3    2     0   0   4.4.4.2=1=10.2.7.7 4.4.4.3=1=10.3.7.7
4.4.4.8   true   10.4.5.4  ethernet2  2     2    2     0   0   4.4.4.3=1=10.3.8.8 4.4.4.4=1=10.4.8.8
4.4.4.10  true   10.1.5.1  ethernet1  4445  2    2     0   0   4.4.4.1=4444=10.1.10.10 4.4.4.4=4444=10.4.10.10
4.4.4.11  true   10.1.5.1  ethernet1  4446  3    1     0   0   4.4.4.2=4444=10.2.11.11

c5#sh ipv6 route v1                                                            
typ  prefix                 metric    iface      hop               time
O    fd00::1/128            110/1     ethernet1  fd00:cafe::1:5:1  00:08:06
O    fd00::2/128            110/2     ethernet1  fd00:cafe::1:5:1  00:07:51
O    fd00::3/128            110/2     ethernet2  fd00:cafe::4:5:4  00:07:33
O    fd00::4/128            110/1     ethernet2  fd00:cafe::4:5:4  00:07:19
C    fd00::5/128            0/0       loopback1  null              00:25:51
O    fd00::6/128            110/2     ethernet1  fd00:cafe::1:5:1  00:06:43
O    fd00::7/128            110/3     ethernet1  fd00:cafe::1:5:1  00:06:29
O    fd00::8/128            110/2     ethernet2  fd00:cafe::4:5:4  00:06:01
O    fd00::a/128            110/6667  ethernet1  fd00:cafe::1:5:1  00:11:45
O    fd00::b/128            110/6668  ethernet1  fd00:cafe::1:5:1  00:09:45
O    fd00:cafe::1:2:0/112   110/1     ethernet1  fd00:cafe::1:5:1  00:25:49
O    fd00:cafe::1:4:0/112   110/1     ethernet2  fd00:cafe::4:5:4  00:25:49
C    fd00:cafe::1:5:0/112   0/0       ethernet1  null              00:25:51
LOC  fd00:cafe::1:5:5/128   0/1       ethernet1  null              00:25:51
O    fd00:cafe::1:6:0/112   110/1     ethernet1  fd00:cafe::1:5:1  00:25:49
O    fd00:cafe::1:10:0/112  110/1     ethernet1  fd00:cafe::1:5:1  00:25:49
O    fd00:cafe::2:3:0/112   110/2     ethernet1  fd00:cafe::1:5:1  00:25:37
O    fd00:cafe::2:6:0/112   110/2     ethernet1  fd00:cafe::1:5:1  00:25:37
O    fd00:cafe::2:7:0/112   110/2     ethernet1  fd00:cafe::1:5:1  00:25:37
O    fd00:cafe::2:11:0/112  110/2     ethernet1  fd00:cafe::1:5:1  00:25:37
O    fd00:cafe::3:4:0/112   110/1     ethernet2  fd00:cafe::4:5:4  00:25:49
O    fd00:cafe::3:7:0/112   110/2     ethernet2  fd00:cafe::4:5:4  00:25:37
O    fd00:cafe::3:8:0/112   110/2     ethernet2  fd00:cafe::4:5:4  00:25:37
O    fd00:cafe::3:11:0/112  110/2     ethernet2  fd00:cafe::4:5:4  00:25:37
C    fd00:cafe::4:5:0/112   0/0       ethernet2  null              00:25:51
LOC  fd00:cafe::4:5:5/128   0/1       ethernet2  null              00:25:51
O    fd00:cafe::4:8:0/112   110/1     ethernet2  fd00:cafe::4:5:4  00:25:49
O    fd00:cafe::4:10:0/112  110/1     ethernet2  fd00:cafe::4:5:4  00:25:49

c5#sh ipv6 ospf 1 topo 0                                                       
node               reach  via               ifc        met   hop  conn  sr  br  neighbors
6.6.6.1/00000000   true   fd00:cafe::1:5:1  ethernet1  1     1    5     0   0   6.6.6.2/00000000=1=10012 6.6.6.4/00000000=1=10015 6.6.6.5/00000000=1=10012 6.6.6.6/00000000=1=10012 6.6.6.10/00000000=6666=10012
6.6.6.2/00000000   true   fd00:cafe::1:5:1  ethernet1  2     2    5     0   0   6.6.6.1/00000000=1=10012 6.6.6.3/00000000=1=10012 6.6.6.7/00000000=1=10012 6.6.6.6/00000000=1=10013 6.6.6.11/00000000=6666=10012
6.6.6.3/00000000   true   fd00:cafe::4:5:4  ethernet2  2     2    4     0   0   6.6.6.2/00000000=1=10013 6.6.6.4/00000000=1=10012 6.6.6.8/00000000=1=10012 6.6.6.7/00000000=1=10013
6.6.6.4/00000000   true   fd00:cafe::4:5:4  ethernet2  1     1    5     0   0   6.6.6.3/00000000=1=10013 6.6.6.8/00000000=1=10013 6.6.6.5/00000000=1=10013 6.6.6.1/00000000=1=10013 6.6.6.10/00000000=6666=10013
6.6.6.5/00000000   true   null              null       0     0    2     0   0   6.6.6.1/00000000=1=10014 6.6.6.4/00000000=1=10014
6.6.6.6/00000000   true   fd00:cafe::1:5:1  ethernet1  2     2    2     0   0   6.6.6.1/00000000=1=10015 6.6.6.2/00000000=1=10015
6.6.6.7/00000000   true   fd00:cafe::1:5:1  ethernet1  3     3    2     0   0   6.6.6.2/00000000=1=10014 6.6.6.3/00000000=1=10015
6.6.6.8/00000000   true   fd00:cafe::4:5:4  ethernet2  2     2    2     0   0   6.6.6.3/00000000=1=10014 6.6.6.4/00000000=1=10013
6.6.6.10/00000000  true   fd00:cafe::1:5:1  ethernet1  6667  2    2     0   0   6.6.6.1/00000000=6666=10016 6.6.6.4/00000000=6666=10016
6.6.6.11/00000000  true   fd00:cafe::1:5:1  ethernet1  6668  3    1     0   0   6.6.6.2/00000000=6666=10016
Ping from rr1 from c5
c5#ping 10.10.10.10 /vrf v1                                                    
pinging 10.10.10.10, src=null, vrf=v1, cnt=5, len=64, tim=1000, ttl=255, tos=0, sweep=false
!!!!!
result=100%, recv/sent/lost=5/5/0, rtt min/avg/max/total=0/0/1/4
c5#ping fd00::a /vrf v1                                                        
pinging fd00::a, src=null, vrf=v1, cnt=5, len=64, tim=1000, ttl=255, tos=0, sweep=false
!!!!!
result=100%, recv/sent/lost=5/5/0, rtt min/avg/max/total=0/0/1/4
c5#                                                                                                                                                 
BGP summary
rr1#sh ipv4 bgp 65535 sum                                                      
as     learn  done  ready  neighbor  uptime
65535  0      0     true   1.1.1.1   16:22:28
65535  0      0     true   2.2.2.2   16:17:26
65535  0      0     true   3.3.3.3   16:16:44
65535  0      0     true   4.4.4.4   16:16:01
65535  0      0     true   5.5.5.5   16:15:32
65535  0      0     true   6.6.6.6   16:14:56
65535  0      0     true   7.7.7.7   16:14:30
65535  0      0     true   8.8.8.8   16:13:37

rr1#sh ipv6 bgp 65535 sum                                                      
as     learn  done  ready  neighbor  uptime
65535  0      0     true   fd00::1   16:20:41
65535  0      0     true   fd00::2   16:18:27
65535  0      0     true   fd00::3   16:17:32
65535  0      0     true   fd00::4   16:16:59
65535  0      0     true   fd00::5   16:16:22
65535  0      0     true   fd00::6   16:15:57
65535  0      0     true   fd00::7   16:15:15
65535  0      0     true   fd00::8   16:14:45

From rr1 check c1 BGP status (pay attention to type = routeReflectorClient)
rr1#show ipv4 bgp 65535 neighbor 1.1.1.1 status                                
peer = 1.1.1.1
reachable state = true
reachable changed = 16:24:12
reachable changes = 1
fallover = null
update group = 0
type = routeReflectorClient
safi =  unicast multicast other flowspec vpnuni vpnmlt vpnflw ovpnuni ovpnmlt ovpnflw vpls mspw evpn mdt srte mvpn omvpn
local = 10.10.10.10
router id = 1.1.1.1
uptime = 16:24:12
hold time = 00:03:00
keepalive time = 00:01:00
32bit as = true
refresh = true, rx=0, tx=0
description = rr clients
hostname = null
compression = rx=true, tx=false
graceful = 
addpath rx = 
addpath tx = 
unicast advertised = 0 of 0, list = 0, accepted = 0 of 0
multicast advertised = 0 of 0, list = 0, accepted = 0 of 0
other advertised = 0 of 0, list = 0, accepted = 0 of 0
flowspec advertised = 0 of 0, list = 0, accepted = 0 of 0
vpnuni advertised = 0 of 0, list = 0, accepted = 0 of 0
vpnmlt advertised = 0 of 0, list = 0, accepted = 0 of 0
vpnflw advertised = 0 of 0, list = 0, accepted = 0 of 0
ovpnuni advertised = 0 of 0, list = 0, accepted = 0 of 0
ovpnmlt advertised = 0 of 0, list = 0, accepted = 0 of 0
ovpnflw advertised = 0 of 0, list = 0, accepted = 0 of 0
vpls advertised = 0 of 0, list = 0, accepted = 0 of 0
mspw advertised = 0 of 0, list = 0, accepted = 0 of 0
evpn advertised = 0 of 0, list = 0, accepted = 0 of 0
mdt advertised = 0 of 0, list = 0, accepted = 0 of 0
srte advertised = 0 of 0, list = 0, accepted = 0 of 0
mvpn advertised = 0 of 0, list = 0, accepted = 0 of 0
omvpn advertised = 0 of 0, list = 0, accepted = 0 of 0
version = 14 of 14, needfull=0, buffull=0
full = 9, 2020-07-27 16:32:29, 16:15:21 ago, 0 ms
incr = 2, 2020-07-28 08:13:10, 00:34:40 ago, 0 ms
connection = tx=173(987) rx=158(986) drp=0(0)
uncompressed = tx=0(0) rx=0(0) drp=0(0)
buffer = max=65536 rx=0 tx=65536

rr1#show ipv6 bgp 65535 neighbor fd00::1 status                                
peer = fd00::1
reachable state = true
reachable changed = 16:22:33
reachable changes = 1
fallover = null
update group = 0
type = routeReflectorClient
safi =  unicast multicast other flowspec vpnuni vpnmlt vpnflw ovpnuni ovpnmlt ovpnflw vpls mspw evpn mdt srte mvpn omvpn
local = fd00::a
router id = 1.1.1.1
uptime = 16:22:33
hold time = 00:03:00
keepalive time = 00:01:00
32bit as = true
refresh = true, rx=0, tx=0
description = rr clients
hostname = null
compression = rx=true, tx=false
graceful = 
addpath rx = 
addpath tx = 
unicast advertised = 0 of 0, list = 0, accepted = 0 of 0
multicast advertised = 0 of 0, list = 0, accepted = 0 of 0
other advertised = 0 of 0, list = 0, accepted = 0 of 0
flowspec advertised = 0 of 0, list = 0, accepted = 0 of 0
vpnuni advertised = 0 of 0, list = 0, accepted = 0 of 0
vpnmlt advertised = 0 of 0, list = 0, accepted = 0 of 0
vpnflw advertised = 0 of 0, list = 0, accepted = 0 of 0
ovpnuni advertised = 0 of 0, list = 0, accepted = 0 of 0
ovpnmlt advertised = 0 of 0, list = 0, accepted = 0 of 0
ovpnflw advertised = 0 of 0, list = 0, accepted = 0 of 0
vpls advertised = 0 of 0, list = 0, accepted = 0 of 0
mspw advertised = 0 of 0, list = 0, accepted = 0 of 0
evpn advertised = 0 of 0, list = 0, accepted = 0 of 0
mdt advertised = 0 of 0, list = 0, accepted = 0 of 0
srte advertised = 0 of 0, list = 0, accepted = 0 of 0
mvpn advertised = 0 of 0, list = 0, accepted = 0 of 0
omvpn advertised = 0 of 0, list = 0, accepted = 0 of 0
version = 14 of 14, needfull=0, buffull=0
full = 9, 2020-07-27 16:32:15, 16:16:37 ago, 0 ms
incr = 2, 2020-07-28 08:13:15, 00:35:38 ago, 0 ms
connection = tx=173(985) rx=158(984) drp=0(0)
uncompressed = tx=0(0) rx=0(0) drp=0(0)
buffer = max=65536 rx=0 tx=65536                                                                                                                                          

Conclusion

In this article you:

  • had a brief introduction of BGP protocol and BGP route reflector rationale
  • learned the design consideration related to BGP RR setup 
  • got a typical BGP configuration example with a long list of AFI/SAFI enabled
  • This configuration is not exhaustive as for example BGP add-path is available but not configured
  • verified BGP RR operation

RARE validated design: [ BGP RR #001 ]- key take-away

  • BGP Router Reflector use case does not require a commercial vendor router, it can be handled perfectly by a sowftare solution running on a server with enoough RAM.

The example above an example of a high availability Route Reflector that is able to handle BGP signalling for a high carrier Service Provider for all address familay

  • Redundant BGP Router Reflection is ensured by deploying 2 RR (at minimum) belonging to the same BGP RR cluster 

In addition to have several RR for the whole domain, it is also common to see hierarchical RR design. SOme Service provider deploy dedicated RR for specific address family (L3VPN unicast for example)

  • RR in the same cluster run basic iBGP session

These RR also share the same cluster ID, in order to ensure route withdraw in case of routing advertisement

  • RR should not be in the traffic datapath

This is the reason why we are setting high cost (4444 and 6666) for IPv4 and IPv6 respectively on both direction on the RR(s) interconnections ports

  • RR design for a multi-service backbone

In the example, the RR client are running only IPv4/IPv6 but the RR design above can empower a Service provider backbone with additional service running on TOP of MPLS, L3VPN, 6VPE, VPLS EVPN etc.

  • In the next article we will dissect the rr1 configurations

This will demonstrate some nice features proposed by freeRouter such as BGP template and nexthop tracking among a list of other feature not mentioned here... (like BGP add-path)


RR design test

You can test this design above in order to check RR and backbone router signalling.

  • Set up freeRouter environment as describe above
  • Get RARE code
Clone RARE code from repository
 git clone https://github.com/frederic-loui/RARE.git
Launch the Service Provider example (diagram above)
cd RARE/00-unit-labs/0101-rare-validated-design-bgp/
make
Access routers using the following command:
c1: telnet localhost 10001 
c2: telnet localhost 10002 
c3: telnet localhost 10003 
c4: telnet localhost 10004 
c4: telnet localhost 10005 
c6: telnet localhost 10006 
c7: telnet localhost 10007 
c8: telnet localhost 10008 
rr1: telnet localhost 10010 
rr2: telnet localhost 10011 
Launch the Service Provider example (diagram above)
cd RARE/00-unit-labs/0101-rare-validated-design-bgp/
make clean

In article #005 you learned how RARE/freeRouter is controlling a P4Emu/pcap dataplane. We also demonstrated that this setup could be integrated into real networks.

Requirement

  • Basic Linux/Unix knowledge
  • Basic networking knowledge

Overview

Though P4Emu/pcap can be used for SOHO and can handle nx1GE of traffic, this comes at a high CPU load cost and thus a higher power consumption. 

"Why write yet another software dataplane as freeRouter has already a working native software dataplane ?"

The partial answer to the question raised in the previous article was:

"decoupling control plane from the dataplane"

We learned that P4Emu:

  • is able to understand the VERY same strict control message from freeRouter as it occurs with a P4 dataplane
  • is able to switch packet emulating router.p4 using libpcap packet forwarding backend.

However, even though libpcap is a performant packet processing library, the kernel is still heavily sollicited and the higher the traffic rate is, the higher CPU workload becomes.

Article objective

In this article we'll using freeRouter setup deployed in #005 and replace P4Emu/pcap's dataplane by P4Emu/dpdk dataplane. 

Source Wikipedia: https://en.wikipedia.org/wiki/Data_Plane_Development_Kit

The Data Plane Development Kit (DPDK) is an Open source software project managed by the Linux Foundation. It provides a set of data plane libraries and network interface controller polling-mode drivers for offloading TCP packet processing from the operating system kernel to processes running in user space. This offloading achieves higher computing efficiency and higher packet throughput than is possible using the interrupt-driven processing provided in the kernel.


It is important to note that though its name implies, P4Emu/dpdk is not emulating V1Model. P4Emu is emulating router.p4 packet processing logic and uses a packet forwarding library to effectively transmit packets at specific ingress port to the right egress port defined by freeRouter control plane message. However, in this precise case, packet processing is offloaded from the kernel to user space. The consequence is the ability with dpdk compatible NIC and driver, to reach tremendous traffic rate. DPDK is not available on all hardware, please refer to DPDK HCL.


Diagram

[ #006 ] - Cookbook

In our example we will use the ubuntu focal as we need dpdk 19.11.1 (latest current version is 20.05.0)

and we add a bridge network interface to or laptop RJ45 connection.

Install dpdk and dpdk-dev
apt-get update
apt-get upgrade
apt-get install dpdk dpdk-dev --no-install-recommends
flush enp0s3 so that it can be controlled by dpdk
ip addr flush enp0s3

Add out of band management enp0s8 with Virtualbox

You can add a second Host-only interface  (enp0s8) in VirtualBox in order to connect the ubuntu focal VM guest as you might lose connection when you flushed enp0s3.
Setup up dpdk and veth pair for control plane dataplane discussion via pcapInt
#!/bin/bash
echo 1 > /proc/sys/net/ipv6/conf/all/disable_ipv6
echo 64 > /proc/sys/vm/nr_hugepages
modprobe uio_pci_generic
dpdk-devbind.py -b uio_pci_generic 00:03.0
ip link add veth0a type veth peer name veth0b
ip link set veth0a up
ip link set veth0b up
Check that dpdk is controlling able to enp0s3
dpdk-devbind.py --status

Network devices using DPDK-compatible driver
============================================
0000:00:03.0 '82540EM Gigabit Ethernet Controller 100e' drv=uio_pci_generic unused=e1000,vfio-pci

Network devices using kernel driver
===================================
0000:00:08.0 '82540EM Gigabit Ethernet Controller 100e' if=enp0s8 drv=e1000 unused=vfio-pci,uio_pci_generic *Active*

No 'Baseband' devices detected
==============================

No 'Crypto' devices detected
============================

No 'Eventdev' devices detected
==============================

No 'Mempool' devices detected
=============================

No 'Compress' devices detected
==============================

No 'Misc (rawdev)' devices detected
===================================
mkdir -p ~/freeRouter/bin ~/freeRouter/lib ~/freeRouter/etc ~/freeRouter/log
cd ~/freeRouter/lib
wget http://freerouter.nop.hu/rtr.jar
Update & Upgrade system
tree freeRouter
freeRouter
├── bin   # binary files      
├── etc   # configuration files      
├── lib   # library files      
└── log   # log files      

get freeRouter net-tools tarball
wget http://www.freertr.net/rtr-`uname -m`.tar -O rtr.tar
Install build tools
tar xvf rtr.tar -C ~/freeRouter/bin/

For those you would like to rebuild these binaries you can find the compilation shell script in freeRouter cloned git repository in: ~/freeRouter/src/native/c.sh

FreeRouter uses 2 configuration files in order to run, let's write these configuration files for R1 in ~/freeRouter/etc

freeRouter hardware configuration file: dpdk-focal-1-hw.txt
hwid hp
! cpu_port
int eth0 eth - 127.0.0.1 20001 127.0.0.1 20002
! freerouter control port for message
tcp2vrf 9080 v1 9080
! freerouter cli
tcp2vrf 2323 v1 23
! launch a process called "veth0" that actually link to veth0b
! cmd: ip link add veth0a type veth peer name veth0b
proc veth0 /root/freertr/bin/pcapInt.bin veth0a 20002 127.0.0.1 20001 127.0.0.1
proc p4emu /root/freertr/bin/p4dpdk.bin --vdev=net_af_packet0,iface=veth0b 127.0.0.1 9080 1

Note:

Let's spend some times on this hardware configuration file, as you might have notice there are additional interesting lines worth to mention:

  • proc <process-name>

It is possible within freeRouter startup to launch processes. We use here this feature to start control plane / dataplane communication via veth pair: veth0a and veth0b and also P4Emu/dpdk, p4dpdk.bin packet processing backend.

  • proc p4emu /root/freertr/bin/p4dpdk.bin --vdev=net_af_packet0,iface=veth0b 127.0.0.1 9080 1

In dpdk, by default dpdk interfaces have port_ids that are sequentially allocated and in the order of appearance in dpdk-devbind --status output usually sorted by pci_id. In the previous output interface enp0s3 has port_id #0 and in dpdk veth0b (CPU_PORT has alwasy the last port_id beside dpdk data port_id, so here it is 1. If for exaplem we dedicate enp0s3, enp0s8, enp0s9, enp0s10 in virtualbox the command would have been:

proc p4emu /root/freertr/bin/p4dpdk.bin --vdev=net_af_packet0,iface=veth0b 127.0.0.1 9080 4

enp0s3 would be: #0 with pci_id: 00:03.0

enp0s8 would be: #1 with pci_id: 00:08.0

enp0s9 would be: #2 with pci_id: 00:09.0

enp0s10 would be: #3 with pci_id: 00:0a.0

freeRouter software configuration file: dpdk-freerouter-sw.txt
hostname dpdk-freerouter
buggy
!
!
vrf definition v1
 rd 1:1
 exit
!
interface ethernet0
 description freerouter@P4_CPU_PORT[veth0a]
 no shutdown
 no log-link-change
 exit
!
interface sdn1
 description freerouter@P4_CPU_PORT[enp0s3]
 mtu 1500
 vrf forwarding v1
 ipv4 address 192.168.0.131 255.255.255.0
 ipv6 address 2a01:e0a:159:2850::666 ffff:ffff:ffff:ffff::
 ipv6 enable
 no shutdown
 no log-link-change
 exit
!
!
!
!
!
!
!
!
!
!
!
!
!
!
server telnet telnet
 security protocol telnet
 no exec authorization
 no login authentication
 vrf v1
 exit
!
server p4lang p4
 export-vrf v1 1
 export-port sdn1 0 0
 interconnect ethernet0
 vrf v1
 exit
!
!
end
freeRouter launch with supplied dpdk-freerouter-hw.txt and dpdk-freerouter-sw.txt with a console prompt
java -jar lib/rtr.jar routersc dpdk-focal-1-hw.txt dpdk-focal-1-sw.txt
info cfg.cfgInit.doInit:cfgInit.java:556 booting
info cfg.cfgInit.doInit:cfgInit.java:680 initializing hardware
info cfg.cfgInit.executeHWcommands:cfgInit.java:469 2:! cpu_port
info cfg.cfgInit.executeHWcommands:cfgInit.java:469 4:! freerouter control port for message
info cfg.cfgInit.executeHWcommands:cfgInit.java:469 6:! freerouter cli
info cfg.cfgInit.executeHWcommands:cfgInit.java:469 8:! launch a process called "veth0" that actually link to veth0b
info cfg.cfgInit.executeHWcommands:cfgInit.java:469 9:! cmd: ip link add veth0a type veth peer name veth0b
info cfg.cfgInit.doInit:cfgInit.java:687 applying defaults
info cfg.cfgInit.doInit:cfgInit.java:695 applying configuration
info cfg.cfgInit.doInit:cfgInit.java:721 done
welcome
line ready
dpdk-freertr-1#

Verification

FreeRouter telnet access from Virtualbox VM guest via port 2323
root@focal-1:~# telnet 127.0.0.1 2323
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
welcome
line ready
dpdk-freerouter#  
freerouter running configuration
dpdk-freerouter#term len 0                                                      
dpdk-freerouter#sh run                                                          
hostname dpdk-freerouter
buggy
!
!
vrf definition v1
 rd 1:1
 exit
!
interface ethernet0
 description freerouter@P4_CPU_PORT[veth0a]
 no shutdown
 no log-link-change
 exit
!
interface sdn1
 description freerouter@P4_CPU_PORT[enp0s3]
 mtu 1500
 macaddr 0078.5223.343c
 lldp enable
 vrf forwarding v1
 ipv4 address 192.168.0.131 255.255.255.0
 ipv6 address 2a01:e0a:159:2850::666 ffff:ffff:ffff:ffff::
 ipv6 enable
 no shutdown
 no log-link-change
 exit
!
!
!
!
!
!
!
!
!
!
!
!
!
!
server telnet telnet
 security protocol telnet
 no exec authorization
 no login authentication
 vrf v1
 exit
!
server p4lang p4
 export-vrf v1 1
 export-port sdn1 0 0
 interconnect ethernet0
 vrf v1
 exit
!
!
end

Check control plane is communicating with P4Emu/dpdk dataplane
dpdk-freerouter#show interfaces summary                                         
interface  state  tx     rx       drop
ethernet0  up     43567  8727278  0
sdn1       up     42659  8675606  0
Ping IPv4 from freerouter -> LAN router gateway
dpdk-freerouter#ping 192.168.0.254 /vrf v1                                      
pinging 192.168.0.254, src=null, cnt=5, len=64, tim=1000, ttl=255, tos=0, sweep=false
!!!!!
result=100%, recv/sent/lost=5/5/0, rtt min/avg/max/total=1/1/1/6
Ping IPv4 from freerouter -> LAN server
dpdk-freerouter#ping 192.168.0.62 /vrf v1                                       
pinging 192.168.0.62, src=null, cnt=5, len=64, tim=1000, ttl=255, tos=0, sweep=false
.!!!!
result=80%, recv/sent/lost=4/5/1, rtt min/avg/max/total=1/1/2/1005

Please observe the 1st ICMP packet loss that triggered ARP learning for respectively 192.168.0.254 and 192.168.0.62.

IPv4 arp check
dpdk-freerouter#sh ipv4 arp sdn1                                                
mac             address        time      static
e03f.496d.1899  192.168.0.62   00:00:24  false    <----- Host server
0024.d4a0.0cd3  192.168.0.254  00:00:24  false    <----- LAN gateway
Ping IPv6 from freerouter -> LAN router
dpdk-freerouter#ping 2a01:e0a:159:2850::1 /vrf v1                               
pinging 2a01:e0a:159:2850::1, src=null, cnt=5, len=64, tim=1000, ttl=255, tos=0, sweep=false
.!!!!
result=80%, recv/sent/lost=4/5/1, rtt min/avg/max/total=1/1/2/1005
Ping IPv6 from freerouter -> Host server and SSH connection test
dpdk-freerouter#ping 2a01:e0a:159:2850:e23f:49ff:fe6d:1899 /vrf v1              
pinging 2a01:e0a:159:2850:e23f:49ff:fe6d:1899, src=null, cnt=5, len=64, tim=1000, ttl=255, tos=0, sweep=false
.!!!!
result=80%, recv/sent/lost=4/5/1, rtt min/avg/max/total=1/1/2/1006

Please observe the 1st ICMP packet loss that triggered IPv6 neighbor discovery for respectively 2a01:e0a:159:2850::1 and 2a01:e0a:159:2850:e23f:49ff:fe6d:1899

IPv6 neighbor discovery check
dpdk-freerouter#show ipv6 neighbors sdn1                                        
mac             address                                time      static  router
0024.d4a0.0cd3  2a01:e0a:159:2850::1                   00:00:39  false   false    <----- LAN gateway
e03f.496d.1899  2a01:e0a:159:2850:e23f:49ff:fe6d:1899  00:00:39  false   false    <----- Host server
0024.d4a0.0cd3  fe80::224:d4ff:fea0:cd3                00:00:39  false   false    <----- Link local LAN gateway EUI64 IPv6 address
e03f.496d.1899  fe80::e23f:49ff:fe6d:1899              00:00:39  false   false    <----- Link local host server IPv6 address
Initiate IPv4 ssh from freerouter -> LAN router gateway
dpdk-freerouter#ssh 192.168.0.62 /vrf v1 /user my-nas                          
 - connecting to 192.168.0.62 22
password: *******
                
 - securing connection

Last login: Tue Jul  7 17:40:55 2020 from 2a01:e0a:159:2850::666
FreeBSD 11.3-RELEASE-p9 (FreeNAS.amd64) #0 r325575+588899735f7(HEAD): Mon Jun  1 15:04:31 EDT 2020

        FreeNAS (c) 2009-2020, The FreeNAS Development Team
        All rights reserved.
        FreeNAS is released under the modified BSD license.

        For more information, documentation, help or support, go here:
        http://freenas.org
Welcome to FreeNAS
MY-NAS%


Initiate IPv6 ssh from freerouter -> LAN router gateway
dpdk-freerouter#ssh 2a01:e0a:159:2850:e23f:49ff:fe6d:1899 /vrf v1 /user my-nas 
 - connecting to 2a01:e0a:159:2850:e23f:49ff:fe6d:1899 22
password: *******
                
 - securing connection

Last login: Wed Jul  8 11:28:32 2020 from 192.168.0.131
FreeBSD 11.3-RELEASE-p9 (FreeNAS.amd64) #0 r325575+588899735f7(HEAD): Mon Jun  1 15:04:31 EDT 2020

        FreeNAS (c) 2009-2020, The FreeNAS Development Team
        All rights reserved.
        FreeNAS is released under the modified BSD license.

        For more information, documentation, help or support, go here:
        http://freenas.org
Welcome to FreeNAS
MY-NAS% 
freeRouter p4dpdk hardware statistics
dpdk-freerouter#sh int snd1 hw                                                  
  hwcounters     - hardware counters
  hwdrhistory    - hardware historic drop byte counters
  hwdrphistory   - hardware historic drop packet counters
  hwhistory      - hardware historic byte counters
  hwnumhist      - hardware numeric historic byte counters
  hwnumphist     - hardware numeric historic packet counters
  hwphistory     - hardware historic packet counters
  hwrates        - hardware traffic rates
  hwrealtime     - hardware realtime counters
  hwrxhistory    - hardware historic rx byte counters
  hwrxphistory   - hardware historic rx packet counters
  hwtxhistory    - hardware historic tx byte counters
  hwtxphistory   - hardware historic tx packet counters

dpdk-freerouter#show interfaces sdn1 hwrates                                    
       packet         byte
time   tx  rx   drop  tx     rx      drop
1sec   5   20   0     1498   4668    0
1min   39  104  0     48056  56745   0
1hour  31  174  0     10162  137481  0

dpdk-freerouter#show interfaces sdn1 hwhistory                                  
        217k|                                                            
        195k|                            #                               
        173k|                            #                               
        151k|   #                        #                            #  
        130k| # #           #          # #                 #          #  
        108k| # #           #          # #        #        #   #      #  
         86k| # #           #  #     # ###   #    #        #  ##      ## 
         65k| # # #    # #  #  #     # ##### ## # #        #####    # ## 
         43k|## # #### # ####  ### # ########## ###### #   ##### # ##### 
         21k|## ###### # ##### ##### ########## ######################## 
           0|########################################################### 
         bps|0---------10--------20--------30--------40--------50-------- seconds

         43m|                                                            
         39m| *                                                          
         34m| *                                                          
         30m| *                                         *                
         26m| *                                         *                
         21m| *                                         *                
         17m| *                                         *                
         13m| *                                         *       *        
       8684k| *                                    *  * *  *    *        
       4342k| *       **                           *  * * ** * **      * 
           0|########################################################### 
         bps|0---------10--------20--------30--------40--------50-------- minutes

         70m|                                                            
         63m| * *                                                        
         56m| * *                                                        
         49m| * *                                                        
         42m| * *                                                        
         35m| * *                                                        
         28m| * *                                                        
         21m|** *                                                        
         14m|** *                                                        
       7017k|****                                                        
           0|##*#                                                        
         bps|0---------10--------20--------30--------40--------50-------- hours


Conclusion

In this article you:

  • had a demonstration of how to integrate freeRouter into a local area network (Similar to article #002)
  • However instead of using P4Emu/dpdk we used a P4Emu/dpdk dataplane
  • communication between freeRouter control plane and P4Emu/dpdk is ensured by pcapInt via veth pair [ veth0a - veth0b ]
  • In this example the freeRouter with P4Emu/dpdk has only 1 dataplane interface that is bound to enp0s3 VM interface exposed to the local network as a bridged interface

[ #006 ] RARE/FreeRouter-101 - key take-away

  • FreeRouter is using UNIX socket in order to forward packet dedicated to control plane + dataplane communication.

This essential paradigm is used to ensure communication between freeRouter and P4Emu/dpdk dataplane. It is ensured by pcapInt binary from freeRouter net-tools that will bind freeRouter socket (veth0a@locathost:22001) to a virtual network interface (veth0b@localhost:22002)  connected to CPU_PORT 1.

  • freeRouter is the control plane for P4Emu/dpdk dataplane

freeRouter is doing all the control plane route computation and write/modify/remove message entry P4 entries are created/modified/removed accordingly from P4Emu/dpdk tables. Although the name is P4Emu, it does not emulate BMv2 V1Model.p4, but rather router.p4

  • dpdk port_id allocation

dpkg port_id allocation follow pci_id port naming convention starting from id 0. p4dpdk.bin is invoked with the parameter: (number_of_dpdk_port - 1) + 1 <--- CPU_PORT

  • In this setup the combination of freeRouter/P4Emu/dpdk delivers a solution for small campus network having 10GE links (100GE links to be validated)

dpkg removed the kernel intervention calls for each packet processed. In that configuration packet processing is now off loaded to user space. Reducing kernel intervention to ~ 0%. Congratulation you have a hardware NIC assisted forwarding is system !

In subsequent article we will see how this setup behaves with a DELL 640 server powered by Intel(R) Xeon(R) Gold 6138 CPU x 2  and equipped with a  Mellanox ConnectX-5 EX Dual Port 100GbE QSFP28 PCIe Adapter Low Profile card. We will also see how to connect this server to a P4 switch, BF2556X-1T. So stay tuned !





In article #003 and #004 you learned how RARE/freeRouter is controlling a P4 dataplane (BMv2 or TOFINO virtual model). We also demonstrated that this setup could be integrated into real networks. However, these P4 dataplanes are not suitable for day to day real operation as it have inherent software limitations. While freeRouter native software dataplane presents the advantage to get  the entire feature set and is sufficient to handle a home network traffic load, we investigated a way to improve dataplane performance. In that context we considered to study:

Requirement

  • Basic Linux/Unix knowledge
  • Basic networking knowledge

Overview

However, XDP model was not complete enough in order to compile router.p4 and we could not generate the corresponding kernel bypass code with ELTE T4P4S based on BMv2 V1Model.p4. (A GitHub issue is still pending). In that context, Csaba freeRouter lead developer decided to develop P4Emu a software dataplane that has the particularity to:

  • understand freeRouter control plane message meant to be addressed to a P4 dataplane
  • thus maintaining the control plane decoupled to the dataplane as it was the case with BMv2 and BF_SWITCHD

One would ask: Why write yet another software dataplane as freeRouter has already a working native software dataplane. This is a very good and valid question. The answer boils down in:

"decoupling control plane from the dataplane"

We will see in subsequent article how P4Emu unlock new valid uses cases.

Article objective

In this article we'll using freeRouter setup deployed in #004 and replace  bf_switchd providing freeRouter INTEL/BAREFOOT TOFINO's dataplane by P4Emu/pcap.

It is important to note that though its name, P4Emu/pcap is not emulating V1Model. P4Emu is emulating router.p4 packet processing logic and uses a packet forwarding library to effectively transmit packets at specific ingress port to the right egress port defined by freeRouter control plane message. 


Diagram

[ #005 ] - Cookbook

In our example we will use the same debian stable image (buster) installed as a VirtualBox VM as in #002.

and we add a bridge network interface to or laptop RJ45 connection.

flush enp0s3 so that it can be controlled by dpdk
ip addr flush enp0s3

Add out of band management enp0s8 with Virtualbox

You can add a second Host-only interface  (enp0s8) in VirtualBox in order to connect the ubuntu focal VM guest as you might lose connection when you flushed enp0s3.
mkdir -p ~/freeRouter/bin ~/freeRouter/lib ~/freeRouter/etc ~/freeRouter/log
cd ~/freeRouter/lib
wget http://freerouter.nop.hu/rtr.jar
Update & Upgrade system
tree freeRouter
freeRouter
├── bin   # binary files      
├── etc   # configuration files      
├── lib   # library files      
└── log   # log files      

get freeRouter net-tools tarball
wget freerouter.nop.hu/rtr.tar
Install build tools
tar xvf rtr.tar -C ~/freeRouter/bin/

For those you would like to rebuild these binaries you can find the compilation shell script in freeRouter cloned git repository in: ~/freeRouter/src/native/c.sh

FreeRouter uses 2 configuration files in order to run, let's write these configuration files for R1 in ~/freeRouter/etc

freeRouter hardware configuration file: pcap-freerouter-hw.txt
int eth0 eth 0000.1111.00fb 127.0.0.1 22710 127.0.0.1 22709
tcp2vrf 2323 v1 23
tcp2vrf 9080 v1 9080
freeRouter software configuration file: pcap-freerouter-sw.txt
hostname pcap-freerouter
buggy
!
vrf definition v1
 exit
!
interface ethernet0
 description freerouter@P4_CPU_PORT[veth251]
 no shutdown
 no log-link-change
 exit
!
interface sdn1
 description freerouter@sdn1[enp0s3]
 mtu 9000
 vrf forwarding v1
 ipv4 address 192.168.0.131 255.255.255.0
 ipv6 address 2a01:e0a:159:2850::666 ffff:ffff:ffff:ffff::
 ipv6 enable
 no shutdown
 no log-link-change
 exit
!
!
!
!
!
!
!
!
!
!
!
!
!
!
server telnet tel
 security protocol telnet
 no exec authorization
 no login authentication
 vrf v1
 exit
!
server p4lang p4
 export-vrf v1 1
 export-port sdn1 1 0
 interconnect ethernet0
 vrf v1
 exit
!
end
Setup P4Emu dataplane communication channel via veth pair and interface adjustment (disable IPv6 at VM guest level, MTU 10240, disable TCP offload etc.)
echo 1 > /proc/sys/net/ipv6/conf/all/disable_ipv6
echo 1 > /proc/sys/net/ipv6/conf/default/disable_ipv6

ip link add veth251 type veth peer name veth250
ip link set veth250  up 
ip link set veth251  up 

ifconfig enp0s3 promisc
ifconfig veth250 promisc
ifconfig veth251 promisc

ip link set dev veth250 up mtu 10240
ip link set dev veth251 up mtu 10240
ip link set dev enp0s3 up mtu 10240
export TOE_OPTIONS="rx tx sg tso ufo gso gro lro rxvlan txvlan rxhash"

for TOE_OPTION in $TOE_OPTIONS; do
    /sbin/ethtool --offload veth250 "$TOE_OPTION" off &> /dev/null
    /sbin/ethtool --offload veth251 "$TOE_OPTION" off &> /dev/null
    /sbin/ethtool --offload enp0s3 "$TOE_OPTION" off &> /dev/null
done
freeRouter launch with supplied pcap-freerouter-hw.txt and pcap-freerouter-sw.txt with a console prompt
java -jar lib/rtr.jar routersc etc/pcap-freerouter-hw.txt etc/pcap-freerouter-sw.txt
info cfg.cfgInit.doInit:cfgInit.java:556 booting
info cfg.cfgInit.doInit:cfgInit.java:680 initializing hardware
info cfg.cfgInit.doInit:cfgInit.java:687 applying defaults
info cfg.cfgInit.doInit:cfgInit.java:695 applying configuration
info cfg.cfgInit.doInit:cfgInit.java:721 done
welcome
line ready
pcap-freerouter#                   
launch freeRouter pcapInt in order to stitch control plane and P4Emu/pcap dataplane communication
cd ~/freeRouter/bin
./pcapInt.bin veth251 22709 127.0.0.1 22710 127.0.0.1
binded to local port 127.0.0.1 22709.
will send to 127.0.0.1 22710.
pcap version: libpcap version 1.8.1
opening interface veth251 with pcap1.x api
serving others
> 
Launch P4Emu/pcap software dataplane
sudo ./p4emu.bin  127.0.0.1 9080 0 veth250 enp0s3
cpu port is #0 of 2...
pcap version: libpcap version 1.8.1
connecting 127.0.0.1 9080.
opening interface veth250.
opening interface enp0s3.
rx: 'myaddr4' 'add' '224.0.0.0' '4' '0' '1' '' 
rx: 'myaddr4' 'add' '255.255.255.255' '32' '0' '1' '' 
rx: 'myaddr6' 'add' 'ff00::' '8' '0' '1' '' 
rx: 'myaddr4' 'add' '192.168.0.0' '24' '-1' '1' '' 
rx: 'myaddr4' 'add' '192.168.0.131' '32' '-1' '1' '' 
rx: 'myaddr6' 'add' '2a01:e0a:159:2850::' '64' '-1' '1' '' 
rx: 'myaddr6' 'add' '2a01:e0a:159:2850::666' '128' '-1' '1' '' 
rx: 'myaddr6' 'add' 'fe80::' '64' '-1' '1' '' 
rx: 'mylabel4' 'add' '615589' '1' '' 
rx: 'mylabel6' 'add' '1036348' '1' '' 
rx: 'state' '1' '1' '0' '' 
rx: 'mtu' '1' '9000' '' 
rx: 'portvrf' 'add' '1' '1' '' 
rx: 'keepalive' '' 
rx: 'keepalive' '' 
rx: 'neigh6' 'add' '11120' 'fe80::224:d4ff:fea0:cd3' '00:24:d4:a0:0c:d3' '1' '00:72:3e:18:1b:6f' '1' '' 
rx: 'keepalive' '' 
rx: 'keepalive' '' 
rx: 'keepalive' '' 
rx: 'neigh4' 'add' '29738' '192.168.0.254' '00:24:d4:a0:0c:d3' '1' '00:72:3e:18:1b:6f' '1' '' 
rx: 'keepalive' '' 
rx: 'neigh4' 'add' '40470' '192.168.0.62' 'e0:3f:49:6d:18:99' '1' '00:72:3e:18:1b:6f' '1' '' 
rx: 'keepalive' '' 
rx: 'keepalive' '' 
rx: 'keepalive' '' 
rx: 'keepalive' '' 
rx: 'keepalive' '' 
rx: 'neigh6' 'add' '45820' '2a01:e0a:159:2850:e23f:49ff:fe6d:1899' 'e0:3f:49:6d:18:99' '1' '00:72:3e:18:1b:6f' '1' '' 
rx: 'keepalive' '' 
rx: 'neigh6' 'add' '49055' 'fe80::e23f:49ff:fe6d:1899' 'e0:3f:49:6d:18:99' '1' '00:72:3e:18:1b:6f' '1' '' 
rx: 'neigh6' 'add' '33334' '2a01:e0a:159:2850::
...

Verification

FreeRouter telnet access from Virtualbox VM guest via port 2323
telnet localhost 2323
Trying ::1...
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
welcome
line ready
pcap-freerouter#
freerouter running configuration
pcap-freerouter#term len 0                                                       
pcap-freerouter#sh run                                                           
hostname pcap-freerouter
buggy
!
vrf definition v1
 exit
!
interface ethernet0
 description freerouter@P4_CPU_PORT[veth251]
 no shutdown
 no log-link-change
 exit
!
interface sdn1
 description freerouter@sdn1[enp0s3]
 mtu 9000
 macaddr 0072.3e18.1b6f
 vrf forwarding v1
 ipv4 address 192.168.0.131 255.255.255.0
 ipv6 address 2a01:e0a:159:2850::666 ffff:ffff:ffff:ffff::
 ipv6 enable
 no shutdown
 no log-link-change
 exit
!
!
!
!
!
!
!
!
!
!
!
!
!
!
server telnet tel
 security protocol telnet
 no exec authorization
 no login authentication
 vrf v1
 exit
!
server p4lang p4
 export-vrf v1 1
 export-port sdn1 1 0
 interconnect ethernet0
 vrf v1
 exit
!
end

Check control plane is communicating with bf_switchd p4 dataplane
pcap-freerouter#show interfaces summary                                          
interface  state  tx    rx      drop
ethernet0  up     8739  404545  0
sdn1       up     8535  400013  0
Ping IPv4 from freerouter -> LAN router gateway
pcap-freerouter#ping 192.168.0.254 /vrf v1                                       
pinging 192.168.0.254, src=null, cnt=5, len=64, tim=1000, ttl=255, tos=0, sweep=false
.!!!!
result=80%, recv/sent/lost=4/5/1, rtt min/avg/max/total=1/1/2/1011
Ping IPv4 from freerouter -> LAN server
pcap-freerouter#ping 192.168.0.62 /vrf v1                                        
pinging 192.168.0.62, src=null, cnt=5, len=64, tim=1000, ttl=255, tos=0, sweep=false
.!!!!
result=80%, recv/sent/lost=4/5/1, rtt min/avg/max/total=1/1/2/1005 

Please observe the 1st ICMP packet loss that triggered ARP learning for respectively 192.168.0.254 and 192.168.0.62.

IPv4 arp check
pcap-freerouter#sh ipv4 arp sdn1                                                 
mac             address        time      static
e03f.496d.1899  192.168.0.62   00:00:57  false    <----- Host server
0024.d4a0.0cd3  192.168.0.254  00:00:57  false    <----- LAN gateway
Ping IPv6 from freerouter -> LAN router
pcap-freerouter#ping 2a01:e0a:159:2850::1  /vrf v1                               
pinging 2a01:e0a:159:2850::1, src=null, cnt=5, len=64, tim=1000, ttl=255, tos=0, sweep=false
.!!!!
result=80%, recv/sent/lost=4/5/1, rtt min/avg/max/total=0/1/2/1004
Ping IPv6 from freerouter -> Host server and SSH connection test
pcap-freerouter#ping 2a01:e0a:159:2850:e23f:49ff:fe6d:1899  /vrf v1              
pinging 2a01:e0a:159:2850:e23f:49ff:fe6d:1899, src=null, cnt=5, len=64, tim=1000, ttl=255, tos=0, sweep=false
.!!!!
result=80%, recv/sent/lost=4/5/1, rtt min/avg/max/total=1/1/1/1006

Please observe the 1st ICMP packet loss that triggered IPv6 neighbor discovery for respectively 2a01:e0a:159:2850::1 and 2a01:e0a:159:2850:e23f:49ff:fe6d:1899

IPv6 neighbor discovery check
pcap-freerouter#show ipv6 neighbors sdn1                                         
mac             address                                time      static  router
0024.d4a0.0cd3  2a01:e0a:159:2850::1                   00:00:53  false   false
e03f.496d.1899  2a01:e0a:159:2850:e23f:49ff:fe6d:1899  00:00:53  false   false
0024.d4a0.0cd3  fe80::224:d4ff:fea0:cd3                00:00:53  false   false
e03f.496d.1899  fe80::e23f:49ff:fe6d:1899              00:00:53  false   false
Initiate IPv4 ssh from freerouter -> LAN router gateway
pcap-freerouter#ssh 192.168.0.62 /vrf v1 /user my-nas                           
 - connecting to 192.168.0.62 22
password: *******
                
 - securing connection

Last login: Mon Jul  6 15:05:38 2020 from 192.168.0.77
FreeBSD 11.3-RELEASE-p9 (FreeNAS.amd64) #0 r325575+588899735f7(HEAD): Mon Jun  1 15:04:31 EDT 2020

        FreeNAS (c) 2009-2020, The FreeNAS Development Team
        All rights reserved.
        FreeNAS is released under the modified BSD license.

        For more information, documentation, help or support, go here:
        http://freenas.org
Welcome to FreeNAS
MY-NAS% 


Initiate IPv6 ssh from freerouter -> LAN router gateway
pcap-freerouter#ssh 2a01:e0a:159:2850:e23f:49ff:fe6d:1899  /vrf v1 /user my-nas 
 - connecting to 2a01:e0a:159:2850:e23f:49ff:fe6d:1899 22
password: *******
                
 - securing connection

Last login: Tue Jul  7 16:01:54 2020 from 2a01:e0a:159:2850::666
FreeBSD 11.3-RELEASE-p9 (FreeNAS.amd64) #0 r325575+588899735f7(HEAD): Mon Jun  1 15:04:31 EDT 2020

        FreeNAS (c) 2009-2020, The FreeNAS Development Team
        All rights reserved.
        FreeNAS is released under the modified BSD license.

        For more information, documentation, help or support, go here:
        http://freenas.org
Welcome to FreeNAS
MY-NAS% 

Conclusion

In this article you:

  • had a demonstration of how to integrate freeRouter into a local area network (Similar to article #002)
  • However instead of using bmv2 or TOFINO we used a P4Emu/pcap dataplane
  • communication between freeRouter control plane and P4Emu/pcap is ensured by pcapInt via veth pair [ veth250 - veth251 ]
  • In this example the freeRouter with P4Emu/pcap has only 1 dataplane interface that is bound to enp0s3 VM interface exposed to the local network as a bridged interface

[ #005 ] RARE/FreeRouter-101 - key take-away

  • FreeRouter is using UNIX socket in order to forward packet dedicated to control plane + dataplane communication.

This essential paradigm is used to ensure communication between freeRouter and P4Emu/pcap dataplane. It is ensured by pcapInt binary from freeRouter net-tools that will bind freeRouter socket (veth251@locathost:22710) to a virtual network interface (veth250@localhost:22709)  connected to CPU_PORT 0.

  • freeRouter is the control plane for P4Emu/pcap dataplane

freeRouter is doing all the control plane route computation and write/modify/remove message entry P4 entries are created/modified/removed accordingly from P4Emu/pcap tables. Although the name is P4Emu, it does not emulate BMv2 V1Model.p4, but rather router.p4

  • In this setup the combination of freeRouter/pcap deliver a solution for SOHO network having 1GE links

However, 1GE traffic rate require 50% of one CPU thread. Nevertheless, traffic rate achieved is higher with P4Emu/pcap than freeRouter native software packet forwarding software.

In subsequent article we will see how we can improve the latter requirement implied by P4Emu/pcap.



In the previous article #003 "Are you P4 compliant ?" we exposed a setup where RARE/freeRouter was controlling BMv2 P4 dataplane called simple_switch_grpc. In this article we replace the open source BMv2 target by a commercial virtual target provided by INTEL/BAREFOOT. As a side note, we will show that this setup can be integrated with real networks. (with inherent software limitations) 

Requirement

  • Basic Linux/Unix knowledge
  • Basic networking knowledge

Overview

I'm repeating the core message from #003: For those who are not familiar with data plane programming and especially with P4, "P4 is a domain-specific programming language for specifying the behaviour of the dataplanes of network-forwarding elements." (from p4.org) in short it helps you to write a "program specifying how a switch processes packets".

Article objective

In this article we'll using freeRouter setup deployed in #003 and replace bmv2/simple_switch_grpc providing freeRouter P4Lang's dataplane by INTEL BAREFOOT/bf_switchd. Actually the effective dataplane is ensured by INTEL/BAREFOOT virtual bf_switchd model running RARE P4 program called: bf_router.p4.

Diagram

[ #004 ] - Cookbook

In our example we will use the OpenNetworkLinux KVM image (ONL9) this is the recommended build from INTEL/BAREFOOT for SDE-9.2.0.

and we add a network interface bridged to our laptop RJ45 connection.

mkdir -p ~/freeRouter/bin ~/freeRouter/lib ~/freeRouter/etc ~/freeRouter/log
cd ~/freeRouter/lib
wget http://freerouter.nop.hu/rtr.jar
Update & Upgrade system
tree freeRouter
freeRouter
├── bin   # binary files      
├── etc   # configuration files      
├── lib   # library files      
└── log   # log files      

get freeRouter net-tools tarball
wget freerouter.nop.hu/rtr.tar
Install build tools
tar xvf rtr.tar -C ~/freeRouter/bin/

For those you would like to rebuild these binaries you can find the compilation shell script in freeRouter cloned git repository in: ~/freeRouter/src/native/c.sh

In that section, you'll need to get access to INTEL/BAREFOOT Software Development Environment. For Research & Academia institution, you can apply here in order to become a FASTER member and access to INTEL/BAREFOOT resources. You can find here, a document installing INTEL/BAREFOOT SDE on ONL for a WEDGE100BF32X system. In our case, we are setting up the following environment:

  • ONL9 as VM guest with kernel 8192 Mb of RAM and 2 vCPU
  • SDE 9.2.0
  • VirtualBox is running on MACOSX host

Just for the sake of example, SDE 9.2.0 is installed in root home directory:

SDE installation environment
export SDE=/root/bf-sde-9.2.0
export SDE_INSTALL=/root/bf-sde-9.2.0/install
export PATH=$PATH:$SDE_INSTALL/bin:$SDE/tools

TOFINO RARE bitbucket is a private repository. It is currently being reworked in order to make it public as per INTEL/BAREFOOT decision to make P4 code related to TOFINO architecture public. (It is thus inaccessible for now but will be opened to the public soon.)

Clone RARE code from repository
cd ~/
git clone https://bitbucket.software.geant.org/scm/rare/rare.git
compile RARE bf_router.p4
p4_build.sh -I /root/rare/p4src/ -DHAVE_MPLS /root/rare/p4src/bf_router.p4 
Using SDE          /root/bf-sde-9.2.0
Using SDE_INSTALL /root/bf-sde-9.2.0/install
Using SDE version bf-sde-9.2.0

OS Name: Ubuntu 18.04.4 LTS
This system has 8GB of RAM and 1 CPU(s)
Parallelization:  Recommended: -j1   Actual: -j1

Compiling for p4_16/tna
P4 compiler path:    /root/bf-sde-9.2.0/install/bin/p4c
P4 compiler version: 9.2.0 (SHA: 639d9ec) (p4c-based)
Build Dir: /root/bf-sde-9.2.0/build/p4-build/bf_router
 Logs Dir: /root/bf-sde-9.2.0/logs/p4-build/bf_router

  Building bf_router        CLEAR CONFIGURE MAKE INSTALL ... DONE

FreeRouter uses 2 configuration files in order to run, let's write these configuration files for R1 in ~/freeRouter/etc

freeRouter hardware configuration file: tna-freerouter-hw.txt
int eth0 eth 0000.1111.00fb 127.0.0.1 22710 127.0.0.1 22709
tcp2vrf 2323 v1 23
tcp2vrf 9080 v1 9080
freeRouter software configuration file: tna-freerouter-sw.txt
hostname tna-freerouter
buggy
!
!
vrf definition v1
 exit
!
interface ethernet0
 description freerouter@P4_CPU_PORT[veth251]
 no shutdown
 no log-link-change
 exit
!
interface sdn1
 description freerouter@sdn1[enp0s3]
 mtu 9000
 macaddr 0072.3e18.1b6f
 vrf forwarding v1
 ipv4 address 192.168.0.131 255.255.255.0
 ipv6 address 2a01:e0a:159:2850::666 ffff:ffff:ffff:ffff::
 ipv6 enable
 no shutdown
 no log-link-change
 exit
!
!
!
!
!
!
!
!
!
!
!
!
!
!
server telnet tel
 security protocol telnet
 no exec authorization
 no login authentication
 vrf v1
 exit
!
server p4lang p4
 export-vrf v1 1
 export-port sdn1 0 10
 interconnect ethernet0
 vrf v1
 exit
!
client tcp-checksum transmit
!
end
Setup bf_switchd dataplane communication channel via veth pair and interface adjustment (disable IPv6 at VM guest level, MTU 10240, disable TCP offload etc.)
echo 1 > /proc/sys/net/ipv6/conf/all/disable_ipv6
echo 1 > /proc/sys/net/ipv6/conf/default/disable_ipv6

ip link add veth251 type veth peer name veth250
ip link set veth250  up 
ip link set veth251  up 

ifconfig enp0s3 promisc
ifconfig veth250 promisc
ifconfig veth251 promisc

ip link set dev veth250 up mtu 10240
ip link set dev veth251 up mtu 10240
ip link set dev enp0s3 up mtu 10240
export TOE_OPTIONS="rx tx sg tso ufo gso gro lro rxvlan txvlan rxhash"

for TOE_OPTION in $TOE_OPTIONS; do
    /sbin/ethtool --offload veth250 "$TOE_OPTION" off &> /dev/null
    /sbin/ethtool --offload veth251 "$TOE_OPTION" off &> /dev/null
    /sbin/ethtool --offload enp0s3 "$TOE_OPTION" off &> /dev/null
done
freeRouter launch with supplied tna-freerouter-hw.txt and tna-freerouter-sw.txt with a console prompt
java -jar lib/rtr.jar routersc etc/p4-freerouter-hw.txt etc/p4-freerouter-sw.txt
info cfg.cfgInit.doInit:cfgInit.java:556 booting
info cfg.cfgInit.doInit:cfgInit.java:680 initializing hardware
info cfg.cfgInit.doInit:cfgInit.java:687 applying defaults
info cfg.cfgInit.doInit:cfgInit.java:695 applying configuration
info cfg.cfgInit.doInit:cfgInit.java:721 done
welcome
line ready
freerouter#                   
launch freeRouter pcapInt in order to stitch control plane and P4 bf_switchd dataplane communication
cd ~/freeRouter/bin
./pcapInt.bin veth251 22709 127.0.0.1 22710 127.0.0.1
binded to local port 127.0.0.1 22709.
will send to 127.0.0.1 22710.
pcap version: libpcap version 1.8.1
opening interface veth251 with pcap1.x api
serving others
> 
Create bf_switchd RARE running environement
mkdir -p ~/rare-run/etc ~/rare-run/logs ~/rare-run/mibs ~/rare-run/snmp
create a custom ports.json file for bf_switchd model
cat ~/rare-run/etc/ports.json
{
    "PortToIf" : [
        { "device_port" :  0, "if" : "enp0s3" },
        { "device_port" : 64, "if" : "veth250" }
    ]
}
run TOFINO model in quiet mode with bf_router as program and log file (if any) should be in ~/rare-run/logs
cd $SDE
./run_tofino_model.sh -p bf_router -f ~/rare-run/etc/ports.json --log-dir ~/rare-run/logs/ -q
Run bf_switchd (logs will be in ~/rare-run/logs)
cd ~/rare-run/logs
 $SDE/run_switchd.sh -p bf_router
Launch RARE bf_forwarder.p4 (BfRuntime GRPC based interface)
cd ~/rare/bfrt_python/
./bf_forwarder.py --ifmibs-dir ~/rare-run/mibs/ --ifindex ~/rare-run/snmp/ifindex
bf_forwarder.py running on: MODEL
GRPC_ADDRESS: 127.0.0.1:50052
P4_NAME: bf_router
CLIENT_ID: 0
Subscribe attempt #1
Subscribe response received 0
Received bf_router on GetForwarding
Binding with p4_name bf_router
Binding with p4_name bf_router successful!!
BfForwarder - loop
  Clearing Table pipe.ig_ctl.ig_ctl_mpls.tbl_mpls_fib
  Clearing Table pipe.ig_ctl.ig_ctl_acl_in.tbl_ipv6_acl
BfIfSnmpClient - main
BfIfSnmpClient - No active ports
  Clearing Table pipe.ig_ctl.ig_ctl_ipv4.tbl_ipv4_fib_host
  Clearing Table pipe.ig_ctl.ig_ctl_copp.tbl_ipv6_copp
  Clearing Table pipe.ig_ctl.ig_ctl_acl_in.tbl_ipv4_acl
  Clearing Table pipe.ig_ctl.ig_ctl_ipv6.tbl_ipv6_fib_host
  Clearing Table pipe.ig_ctl.ig_ctl_mpls.tbl_mpls_fib_decap
  Clearing Table pipe.ig_ctl.ig_ctl_nexthop.tbl_nexthop
  Clearing Table pipe.ig_ctl.ig_ctl_vlan_out.tbl_vlan_out
  Clearing Table pipe.ig_ctl.ig_ctl_vlan_in.tbl_vlan_in
  Clearing Table pipe.ig_ctl.ig_ctl_acl_out.tbl_ipv6_acl
  Clearing Table pipe.ig_ctl.ig_ctl_ipv4.tbl_ipv4_fib_lpm
  Clearing Table pipe.ig_ctl.ig_ctl_acl_out.tbl_ipv4_acl
  Clearing Table pipe.ig_ctl.ig_ctl_vrf.tbl_vrf
  Clearing Table pipe.ig_ctl.ig_ctl_copp.tbl_ipv4_copp
  Clearing Table pipe.ig_ctl.ig_ctl_ipv6.tbl_ipv6_fib_lpm
  Clearing Table pipe.ig_ctl.ig_ctl_bridge.tbl_bridge_target
  Clearing Table pipe.ig_ctl.ig_ctl_bridge.tbl_bridge_learn
Bundle specific clearing: (Order matters)
  Clearing Bundle Table pipe.ig_ctl.ig_ctl_bundle.tbl_nexthop_bundle
  Clearing Bundle Table pipe.ig_ctl.ig_ctl_bundle.ase_bundle
  Clearing Bundle Table pipe.ig_ctl.ig_ctl_bundle.apr_bundle
BfForwarder - Main
BfForwarder - Entering message loop
rx: ['myaddr4_add', '224.0.0.0/4', '0', '1', '\n']
BfIfStatus - main
BfIfStatus - No active ports
rx: ['myaddr4_add', '255.255.255.255/32', '0', '1', '\n']
BfSubIfCounter - main
BfSubIfCounter - No active ports
rx: ['myaddr6_add', 'ff00::/8', '0', '1', '\n']
rx: ['myaddr4_add', '192.168.0.0/24', '-1', '1', '\n']
rx: ['myaddr4_add', '192.168.0.131/32', '-1', '1', '\n']
rx: ['myaddr6_add', '2a01:e0a:159:2850::/64', '-1', '1', '\n']
rx: ['myaddr6_add', '2a01:e0a:159:2850::666/128', '-1', '1', '\n']
rx: ['myaddr6_add', 'fe80::/64', '-1', '1', '\n']
rx: ['mylabel4_add', '186286', '1', '\n']
rx: ['mylabel6_add', '842368', '1', '\n']
rx: ['state', '0', '1', '10', '\n']
rx: ['mtu', '0', '9000', '\n']
rx: ['portvrf_add', '0', '1', '\n']
rx: ['neigh6_add', '20989', 'fe80::224:d4ff:fea0:cd3', '00:24:d4:a0:0c:d3', '1', '00:72:3e:18:1b:6f', '0', '\n']
BfIfSnmpClient - added stats for port 0
rx: ['keepalive', '\n']
rx: ['neigh4_add', '29777', '192.168.0.254', '00:24:d4:a0:0c:d3', '1', '00:72:3e:18:1b:6f', '0', '\n']
rx: ['keepalive', '\n']
rx: ['neigh6_add', '25745', 'fe80::bc6a:83ad:7897:8461', '00:13:46:3c:a9:4f', '1', '00:72:3e:18:1b:6f', '0', '\n']
rx: ['keepalive', '\n']
rx: ['neigh6_add', '41106', 'fe80::e23f:49ff:fe6d:1899', 'e0:3f:49:6d:18:99', '1', '00:72:3e:18:1b:6f', '0', '\n']
rx: ['keepalive', '\n']
rx: ['neigh6_add', '35111', '2a01:e0a:159:2850:e23f:49ff:fe6d:1899', 'e0:3f:49:6d:18:99', '1', '00:72:3e:18:1b:6f', '0', '\n']
rx: ['keepalive', '\n']
rx: ['neigh6_del', '25745', 'fe80::bc6a:83ad:7897:8461', '00:13:46:3c:a9:4f', '1', '00:72:3e:18:1b:6f', '0', '\n']
rx: ['keepalive', '\n']
rx: ['neigh6_add', '20371', 'fe80::bc6a:83ad:7897:8461', '00:13:46:3c:a9:4f', '1', '00:72:3e:18:1b:6f', '0', '\n']
rx: ['keepalive', '\n']
...
rx: ['keepalive', '\n']
rx: ['neigh4_add', '34182', '192.168.0.62', 'e0:3f:49:6d:18:99', '1', '00:72:3e:18:1b:6f', '0', '\n']
...

Verification

FreeRouter telnet access from Virtualbox VM guest via port 2323
telnet localhost 2323
Trying ::1...
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
welcome
line ready
tna-freerouter#
freerouter running configuration
tna-freerouter#term len 0                                                       
tna-freerouter#sh run                                                           
hostname tna-freerouter
buggy
!
!
vrf definition v1
 exit
!
interface ethernet0
 description freerouter@P4_CPU_PORT[veth251]
 no shutdown
 no log-link-change
 exit
!
interface sdn1
 description freerouter@sdn1[enp0s9]
 mtu 9000
 macaddr 0072.3e18.1b6f
 vrf forwarding v1
 ipv4 address 192.168.0.131 255.255.255.0
 ipv6 address 2a01:e0a:159:2850::666 ffff:ffff:ffff:ffff::
 ipv6 enable
 no shutdown
 no log-link-change
 exit
!
!
!
!
!
!
!
!
!
!
!
!
!
!
server telnet tel
 security protocol telnet
 no exec authorization
 no login authentication
 vrf v1
 exit
!
server p4lang p4
 export-vrf v1 1
 export-port sdn1 0 10
 interconnect ethernet0
 vrf v1
 exit
!
client tcp-checksum transmit
!
end
Check control plane is communicating with bf_switchd p4 dataplane
tna-freerouter#sh int sum                                                       
interface  state  tx     rx         drop
ethernet0  up     89955  128007451  0
sdn1       up     87291  127572417  0
Ping IPv4 from freerouter -> LAN router gateway
tna-freerouter#ping 192.168.0.254 /vrf v1 /repeat 11111                         
pinging 192.168.0.254, src=null, cnt=11111, len=64, tim=1000, ttl=255, tos=0, sweep=false
..!........!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!*
result=95%, recv/sent/lost=197/207/10, rtt min/avg/max/total=27/54/645/20764

The output above indicates that there are packet losses. This is due to the fact fact as soon as bf_switch port 0 is bridged to enp0s3 on the local area netwrok it receives a lot of packet. These packet have to be process by bf_switchd increasing the processing delay as all the packet received via enp0s3 has to be queued and processed by bf_switchd model.

IPv4 arp check
tna-freerouter#sh ipv4 arp sdn1                                                 
mac             address        time      static
e03f.496d.1899  192.168.0.62   00:05:27  false    <----- Host server
9ceb.e8d5.2c51  192.168.0.77   00:05:27  false    <----- VM guest bridged IP
0024.d4a0.0cd3  192.168.0.254  00:01:27  false    <----- LAN gateway

Ping IPv6 from freerouter -> Host server and SSH connection test
tna-freerouter#..1:e0a:159:2850:e23f:49ff:fe6d:1899 /vrf v1 /repeat 111111      
pinging 2a01:e0a:159:2850:e23f:49ff:fe6d:1899, src=null, cnt=111111, len=64, tim=1000, ttl=255, tos=0, sweep=false
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!*
result=100%, recv/sent/lost=89/89/0, rtt min/avg/max/total=30/50/600/4467
IPv6 neighbor discovery check
tna-freerouter#show ipv6 neighbors sdn1                                         
mac             address                                time      static  router
0024.d4a0.0cd3  2a01:e0a:159:2850::1                   00:00:26  false   false    <----- LAN gateway
e03f.496d.1899  2a01:e0a:159:2850:e23f:49ff:fe6d:1899  00:03:26  false   false    <----- Host server
0024.d4a0.0cd3  fe80::224:d4ff:fea0:cd3                00:01:26  false   false
e03f.496d.1899  fe80::e23f:49ff:fe6d:1899              00:02:26  false   false
Initiate IPv4 ssh from freerouter -> LAN router gateway
tna-freerouter#ssh 192.168.0.62 /vrf v1 /user my-nas                           
 - connecting to 192.168.0.62 22
password: *******
                
 - securing connection

Last login: Fri Jul  3 10:57:02 2020 from 192.168.0.66
FreeBSD 11.3-RELEASE-p9 (FreeNAS.amd64) #0 r325575+588899735f7(HEAD): Mon Jun  1 15:04:31 EDT 2020

        FreeNAS (c) 2009-2020, The FreeNAS Development Team
        All rights reserved.
        FreeNAS is released under the modified BSD license.

        For more information, documentation, help or support, go here:
        http://freenas.org
Welcome to FreeNAS
MY-NAS% 


Initiate IPv6 ssh from freerouter -> LAN router gateway
tna-freerouter#..:e0a:159:2850:e23f:49ff:fe6d:1899 /vrf v1 /user my-nas        
 - connecting to 2a01:e0a:159:2850:e23f:49ff:fe6d:1899 22
password: *******
                
 - securing connection

Last login: Mon Jul  6 11:05:31 2020 from 192.168.0.131
FreeBSD 11.3-RELEASE-p9 (FreeNAS.amd64) #0 r325575+588899735f7(HEAD): Mon Jun  1 15:04:31 EDT 2020

        FreeNAS (c) 2009-2020, The FreeNAS Development Team
        All rights reserved.
        FreeNAS is released under the modified BSD license.

        For more information, documentation, help or support, go here:
        http://freenas.org
Welcome to FreeNAS
MY-NAS% 

Conclusion

In this article you:

  • had a demonstration of how to integrate freeRouter into a local area network (Similar to article #002)
  • However instead of using bmv2 we used a INTEL/BAREFOOT P4 dataplane called: TOFINO (bf_switchd)
  • TOFINO bf_switchd target is running RARE bf_router.p4
  • communication between freeRouter control plane and TOFINO is ensured by pcapInt via veth pair [ veth250 - veth251 ]
  • This communication is possible via RARE bf_forwarder.py based on GRPC P4Lang BfRuntime python binding
  • In this example the TOFINO bf_switchd P4 virtual switch model has only 1 dataplane interface that is bound to enp0s3 VM interface exposed to the local network as a bridged interface

[ #004 ] RARE/FreeRouter-101 - key take-away

  • FreeRouter is using UNIX socket in order to forward packet dedicated to control plane + dataplane communication.

This essential paradigm is used to ensure communication between freeRouter and TOFINO bf_switchd P4 dataplane. It is ensured by pcapInt binary from freeRouter net-tools that will bind freeRouter socket (veth251@locathost:22710) to a virtual network interface (veth250@localhost:22709)  connected to CPU_PORT 64.

  • freeRouter control plane and dataplane communication is enabled by RARE bf_forwarder.py 

bf_forwarder.py is a simple python script based on GRPC client BfRuntime python library.

freeRouter is doing all the control plane route computation and write/modify/remove message entry via BfRuntime so that P4 entries are created/modified/removed accordingly from P4 tables

While TOFINO bf_switchd virtual model target is a very good choice for packet processing algorithm validation on TOFINO platform, the virtual model is not a target for production use. We will see in next articles how we can reach TREMENDOUS traffic throughput required by Internet Service Provider's use cases. Indeed, while with the model we can validate algorithm accuracy, traffic transfers achieved have a very low throughput. (I could barely make my setup described above working)

In a subsequent article we will demonstrate how we can create with RARE/freeRouter/TOFINO TNA architecture, a service provider/carrier grade router that technically is able to switch 3.3 Tbps of traffic (line rate) using EdgeCore WEDGE100BF32X hardware switch.

TOFINO family most powerful Programmable Switching ASIC has the ability to switch 6.5 Tbps traffic throughput, our WEDGE100BF32X switches are powered by the ASIC's little brother that is able to handle 3.3 Tbps line rate traffic throughput.


"Are you P4 compliant ?". In France in the 1990's it was a pure French private joke before the military service was officially abolished. At that time being "classé P4" meant that you were mentally unable to join the French military army. Even if you wanted to. Therefore, at the age of 18, some daring people faked mental illness in order to avoid the "Service militaire" (1 year duration). Of course here, P4 is about the data plane programming language from P4Lang project.

Requirement

  • Basic Linux/Unix knowledge
  • Basic networking knowledge

Overview

For those who are not familiar with data plane programming and especially with P4, "P4 is a domain-specific programming language for specifying the behaviour of the dataplanes of network-forwarding elements." (from p4.org) in short it helps you to write a "program specifying how a switch processes packets".

Article objective

In this article we'll using freeRouter setup deployed in #002 and replace the pcapInt providing freeRouter native software dataplane with P4Lang's dataplane. Actually the effective dataplane is ensured P4lang virtual simple_switch_grpc running RARE P4 program called: router.p4.

Diagram

[ #003 ] - Cookbook

In our example we will use the same debian stable image (buster) installed as a VirtualBox VM as in #002.

and we add a bridge network interface to or laptop RJ45 connection.

mkdir -p ~/freeRouter/bin ~/freeRouter/lib ~/freeRouter/etc ~/freeRouter/log
cd ~/freeRouter/lib
wget http://freerouter.nop.hu/rtr.jar
Update & Upgrade system
╭─[11:11:54]floui@debian ~ 
╰─➤ tree freeRouter
freeRouter
├── bin   # binary files      
├── etc   # configuration files      
├── lib   # library files      
└── log   # log files      

get freeRouter net-tools tarball
wget freerouter.nop.hu/rtr.tar
Install build tools
tar xvf rtr.tar -C ~/freeRouter/bin/

For those you would like to rebuild these binaries you can find the compilation shell script in freeRouter cloned git repository in: ~/freeRouter/src/native/c.sh

add p4lang repository in /etc/apt/sources.list.d/p4.list
deb https://download.opensuse.org/repositories/home:/frederic-loui:/p4lang:/p4c:/master/Debian_10/ ./
add debian 10 repository key from download.opensuse.org
wget https://download.opensuse.org/repositories/home:/frederic-loui:/p4lang:/p4c:/master/Debian_10/Release.key
sudo apt-key add ./Release.key
install p4lang packages (just install p4c and it will install p4lang-pi and bmv2)
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install p4c
check p4lang packages installation
╭─[4:59:33]floui@debian ~/freeRouter/etc  
╰─➤  dpkg -l | grep p4lang
ii  bmv2                                   20200615~d447b6a~release~nightly-0+57.1 amd64        p4lang behavioral-model
ii  p4c                                    20200628~7c03f854~release~nightly-0     amd64        p4c p4lang project compiler
ii  p4lang-pi                              20200601~822a0d1~release~nightly-0+39.1 amd64        Implementation framework of a P4Runtime server
Clone RARE code from repository
cd ~/
git clone https://github.com/frederic-loui/RARE.git
compile RARE router.p4
cd ~/RARE/02-PE-labs/p4src
╭─[5:26:06]floui@debian ~/RARE/02-PE-labs/p4src  ‹master› 
╰─➤  make build
mkdir -p ../build ../run/log
p4c --std p4-16 --target bmv2 --arch v1model \
        -I ./ -o ../build --p4runtime-files ../build/router.txt router.p4 

FreeRouter uses 2 configuration files in order to run, let's write these configuration files for R1 in ~/freeRouter/etc

freeRouter hardware configuration file: p4-freerouter-hw.txt
int eth0 eth 0000.1111.00fb 127.0.0.1 22710 127.0.0.1 22709
tcp2vrf 2323 v1 23
tcp2vrf 9080 v1 9080
freeRouter software configuration file: p4-freerouter-sw.txt
hostname p4-freerouter
buggy
!
vrf definition v1
 exit
!
interface ethernet0
 description freerouter@P4_CPU_PORT[veth251]
 no shutdown
 no log-link-change
 exit
!
interface sdn1
 description freerouter@sdn1[enp0s9]
 mtu 9000
 macaddr 0072.3e18.1b6f
 vrf forwarding v1
 ipv4 address 192.168.1.131 255.255.255.0
 ipv6 address fd7d:a59c:650b::666 ffff:ffff:ffff:fff0::
 ipv6 enable
 no shutdown
 no log-link-change
 exit
!
!
!
!
!
!
!
!
!
!
!
!
!
!
server telnet tel
 security protocol telnet
 no exec authorization
 no login authentication
 vrf v1
 exit
!
server p4lang p4
 export-vrf v1 1
 export-port sdn1 1 0
 interconnect ethernet0
 vrf v1
 exit
!
client tcp-checksum transmit
!
end
Setup BMv2 P4 dataplane communication channel via veth pair
sudo ip link add veth251 type veth peer name veth250
sudo ip link set veth250 up  
sudo ip link set veth251 up  
freeRouter launch with supplied p4-freerouter-hw.txt and p4-freerouter-sw.txt with a console prompt
╭─[6:06:13]floui@debian ~/freeRouter  
╰─➤  java -jar lib/rtr.jar routersc etc/p4-freerouter-hw.txt etc/p4-freerouter-sw.txt
info cfg.cfgInit.doInit:cfgInit.java:556 booting
info cfg.cfgInit.doInit:cfgInit.java:680 initializing hardware
info cfg.cfgInit.doInit:cfgInit.java:687 applying defaults
info cfg.cfgInit.doInit:cfgInit.java:695 applying configuration
info cfg.cfgInit.doInit:cfgInit.java:721 done
welcome
line ready
freerouter#                   
launch freeRouter pcapInt in order to stitch control plane and P4 BMv2 dataplane communication
╭─[1:00:53]floui@debian[1]  ~/freeRouter/bin  
╰─➤  sudo ./pcapInt.bin veth251 22709 127.0.0.1 22710 127.0.0.1
binded to local port 127.0.0.1 22709.
will send to 127.0.0.1 22710.
pcap version: libpcap version 1.8.1
opening interface veth251 with pcap1.x api
serving others
> 
Run RARE P4 dataplane - simple_switch_grpc router.p4
export P4_RARE_ROOT=/home/floui/RARE/02-PE-labs
sudo simple_switch_grpc --log-file $P4_RARE_ROOT/run/log/p4-freerouter.log \                                               
                        -i 1@enp0s9 \                        
                        -i 64@veth250 \                                                
                        --thrift-port 9090 --nanolog ipc://$P4_RARE_ROOT/run/bm-0-log.ipc --device-id 0 $P4_RARE_ROOT/build/simple_switch_grpc.json \
                        -- --grpc-server-addr 127.0.0.1:50051 > $P4_RARE_ROOT/run/log/p4-freerouter.out 2>&1 &                  
Launch forwarder.p4 (p4runtime GRPC based interface)
╭─[2:07:10]floui@debian[1]  ~/RARE/02-PE-labs/p4src  ‹master*› 
╰─➤  ./forwarder.py
rx:  ['myaddr4_add', '224.0.0.0/4', '0', '1', '\n']
rx:  ['myaddr4_add', '255.255.255.255/32', '0', '1', '\n']
rx:  ['myaddr6_add', 'ff00::/8', '0', '1', '\n']
rx:  ['myaddr4_add', '192.168.1.0/24', '-1', '1', '\n']
rx:  ['myaddr4_add', '192.168.1.131/32', '-1', '1', '\n']
rx:  ['myaddr6_add', 'fd7d:a59c:650b::/60', '-1', '1', '\n']
rx:  ['myaddr6_add', 'fd7d:a59c:650b::666/128', '-1', '1', '\n']
rx:  ['myaddr6_add', 'fe80::/64', '-1', '1', '\n']
rx:  ['mylabel6_add', '270549', '1', '\n']
rx:  ['mylabel4_add', '606864', '1', '\n']
rx:  ['state', '1', '1', '0', '\n']
rx:  ['mtu', '1', '9000', '\n']
rx:  ['portvrf_add', '1', '1', '\n']
rx:  ['neigh4_add', '14252', '192.168.1.1', '34:ce:00:67:18:c2', '1', '00:72:3e:18:1b:6f', '1', '\n']
rx:  ['neigh4_add', '52194', '192.168.1.143', '9c:eb:e8:d5:2c:51', '1', '00:72:3e:18:1b:6f', '1', '\n']
rx:  ['keepalive', '\n']
rx:  ['keepalive', '\n']
...

Verification

FreeRouter telnet access from Virtualbox VM guest via port 2323
╭─[7:07:41]floui@debian[1]  ~/freeRouter/etc  
╰─➤  telnet localhost 2323
Trying ::1...
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
welcome
line ready
freerouter#
freerouter running configuration
p4-freerouter#sh run                                                           
hostname p4-freerouter
buggy
!
vrf definition v1
 exit
!
interface ethernet0
 description freerouter@P4_CPU_PORT[veth251]
 no shutdown
 no log-link-change
 exit
!
interface sdn1
 description freerouter@sdn1[enp0s9]
 mtu 9000
 macaddr 0072.3e18.1b6f
 vrf forwarding v1
 ipv4 address 192.168.1.131 255.255.255.0
 ipv6 address fd7d:a59c:650b::666 ffff:ffff:ffff:fff0::
 ipv6 enable
 no shutdown
 no log-link-change
 exit
!
!
!
!
!
!
!
!
!
!
!
!
!
!
server telnet tel
 security protocol telnet
 no exec authorization
 no login authentication
 vrf v1
 exit
!
server p4lang p4
 export-vrf v1 1
 export-port sdn1 1 0
 interconnect ethernet0
 vrf v1
 exit
!
client tcp-checksum transmit
!
end
Check control plane is communicating with bmv2 p4 dataplane
p4-freerouter#show interfaces summary                                          
interface  state  tx     rx    drop
ethernet0  up     10616  9243  0
sdn1       up     10340  9069  0
Ping IPv4 from freerouter -> LAN router gateway
p4-freerouter#ping 192.168.1.131 /vrf v1                                       
pinging 192.168.1.131, src=null, cnt=5, len=64, tim=1000, ttl=255, tos=0, sweep=false
!!!!!
result=100%, recv/sent/lost=5/5/0, rtt min/avg/max/total=0/0/0/0
p4-freerouter#ping 192.168.1.1 /vrf v1                                         
pinging 192.168.1.1, src=null, cnt=5, len=64, tim=1000, ttl=255, tos=0, sweep=false
!!!!!
result=100%, recv/sent/lost=5/5/0, rtt min/avg/max/total=3/5/10/28
IPv4 arp check ( 192.168.1.1 is the gateway, 192.168.1.143 is the VM host)
p4-freerouter#sh ipv4 arp sdn1                                                 
mac             address        time      static
34ce.0067.18c2  192.168.1.1    00:00:35  false
9ceb.e8d5.2c51  192.168.1.143  00:00:35  false
6420.0c65.437b  192.168.1.173  00:01:35  false                                                     

Ping IPv6 from freerouter -> LAN router gateway
p4-freerouter#ping fd7d:a59c:650b::666 /vrf v1                                 
pinging fd7d:a59c:650b::666, src=null, cnt=5, len=64, tim=1000, ttl=255, tos=0, sweep=false
!!!!!
result=100%, recv/sent/lost=5/5/0, rtt min/avg/max/total=0/0/0/1
p4-freerouter#ping fd7d:a59c:650b::1 /vrf v1                                   
pinging fd7d:a59c:650b::1, src=null, cnt=5, len=64, tim=1000, ttl=255, tos=0, sweep=false
!!!!!
result=100%, recv/sent/lost=5/5/0, rtt min/avg/max/total=2/2/3/13                                                                                                       
IPv6 neighbor discovery check ( 192.168.1.1 is the gateway, 192.168.1.143 is the VM host)
p4-freerouter#show ipv6 neighbors sdn1                                         
mac             address                    time      static  router
34ce.0067.18c2  fd7d:a59c:650b::1          00:01:22  false   false
9ceb.e8d5.2c51  fe80::10e6:87a7:6a9:f14a   00:01:22  false   false
34ce.0067.18c2  fe80::36ce:ff:fe67:18c2    00:01:22  false   false
b6be.fdcf.d0f9  fe80::b4be:fdff:fecf:d0f9  00:01:22  false   false
Initiate IPv4 ssh from freerouter -> LAN router gateway
p4-freerouter#ssh 192.168.1.1 /vrf v1 /user root                               
 - connecting to 192.168.1.1 22
password: ***************
                
 - securing connection



BusyBox v1.28.4 () built-in shell (ash)

  _______                     ________        __
 |       |.-----.-----.-----.|  |  |  |.----.|  |_
 |   -   ||  _  |  -__|     ||  |  |  ||   _||   _|
 |_______||   __|_____|__|__||________||__|  |____|
          |__| W I R E L E S S   F R E E D O M
 -----------------------------------------------------
 OpenWrt 18.06.2, r7676-cddd7b4c77
 -----------------------------------------------------
root@OpenWrt:~# 


Initiate IPv6 ssh from freerouter -> LAN router gateway
p4-freerouter#ssh fd7d:a59c:650b::1 /vrf v1 /user root                         
 - connecting to fd7d:a59c:650b::1 22
password: ***************
                
 - securing connection



BusyBox v1.28.4 () built-in shell (ash)

  _______                     ________        __
 |       |.-----.-----.-----.|  |  |  |.----.|  |_
 |   -   ||  _  |  -__|     ||  |  |  ||   _||   _|
 |_______||   __|_____|__|__||________||__|  |____|
          |__| W I R E L E S S   F R E E D O M
 -----------------------------------------------------
 OpenWrt 18.06.2, r7676-cddd7b4c77
 -----------------------------------------------------

Conclusion

In this article you:

  • had a demonstration of how to integrate freeRouter into a local area network (Similar to article #002)
  • However instead of using pcapInt you are now using a software P4 dataplane from P4lang project: bmv2 
  • BMv2 simple_switch_grpc target is used an run RARE router.p4
  • communication between freeRouter control plane and bmv2 is ensured by pcapInt via veth pair [ veth250 - veth251 ]
  • This communication is possible via RARE forwarder.py based on GRPC P4Lang P4Runtime python binding
  • In this example the BMv2 P4 switch has only 1 dataplane interface that is bound to enp0s9 VM interface exposed to the local network as a bridged interface

[ #003 ] RARE/FreeRouter-101 - key take-away

  • FreeRouter is using UNIX socket in order to forward packet dedicated to control plane + dataplane communication.

This essential paradigm is used to ensure communication between freeRouter and BMv2 P4 dataplane. It is ensured by pcapInt binary from freeRouter net-tools that will bind freeRouter socket (veth251@locathost:22710) to a virtual network interface (veth250@localhost:22709)  connected to CPU_PORT 64.

  • freeRouter control plane and dataplane communication is enabled by RARE forwarder.py 

forwarder.py is a simple python script based on GRPC P4Runtime python library.

freeRouter is doing all the control plane route computation and write/modify/remove message entry via P4Runtime so that P4 entries are created/modified/removed accordingly from P4 tables

While BMv2 target is a very good choice for packet processing algorithm validation, it is not an ideal target for production use. We will see in next articles how we can reach a higher rate throughput related required by use cases defined by network operators.  

While in article #001 of the 101 series we learnt how to spawn 2 router instances on the same VM, this use case is only useful for learning/pedagogic purposes. freeRouter can be considered as networking Swiss Army Knife in real networks. We will demonstrate further freeRouter capability to take control a a full VM and then be able to directly communicate with the external real world via the VM network device interface. i.e Out of the VM scope.

Requirement

  • Basic Linux/Unix knowledge
  • Basic networking knowledge

Overview

 Working with freeRouter inside VM is interesting but working and interact with the outside world is way more exciting !

Article objective

In this article we'll explain how to integrate freeRouter in an existing local area network (my home network) and how to inherit from IPv4 DHCP and IPv6 SLAAC. Though this simple example is consumer/end user oriented, freeRouter can be incorporated into a Internet Service provider environment.  You can easily imagine how to build a highly scalable and versatile BGP route Reflector, sophisticated route server, ROA/RPKI validator or even a BGP BMP server ... (and the list of features set is huge). For example, in one of my project since 2015 I'm using freeRouter as a BGP route reflector inside a k8s cluster running calico network plugin.

Diagram

[ #002 ] - Cookbook

In our example we will use a genuine debian stable image (buster) installed as a VirtualBox VM.

and we add a bridge network interface to or laptop RJ45 connection.

mkdir -p ~/freeRouter/bin ~/freeRouter/lib ~/freeRouter/etc ~/freeRouter/log
cd ~/freeRouter/lib
wget http://freerouter.nop.hu/rtr.jar
Update & Upgrade system
╭─[11:11:54]floui@debian ~ 
╰─➤ tree freeRouter
freeRouter
├── bin   # binary files      
├── etc   # configuration files      
├── lib   # library files      
└── log   # log files      

get freeRouter net-tools tarball
wget freerouter.nop.hu/rtr.tar
Install build tools
tar xvf rtr.tar -C ~/freeRouter/bin/

For those you would like to rebuild these binaries you can find the compilation shell script in freeRouter cloned git repository in: ~/freeRouter/src/native/c.sh

FreeRouter uses 2 configuration files in order to run, let's write these configuration files for R1 in ~/freeRouter/etc

freeRouter hardware file: freerouter-hw.txt
int eth1 eth 0000.1111.0001 127.0.0.1 26011 127.0.0.1 26021
tcp2vrf 2323 v1 23
freeRouter software configuration file: r1-sw.txt
freerouter#sh run                                                              
hostname freerouter
buggy
!
!
prefix-list p4
 sequence 10 permit 0.0.0.0/0 ge 0 le 0
 exit
!
prefix-list p6
 sequence 10 permit ::/0 ge 0 le 0
 exit
!
vrf definition v1
 exit
!
interface ethernet1
 description freerouter@enp0s9
 vrf forwarding v1
 ipv4 address dynamic 255.255.255.0
 ipv4 gateway-prefix p4
 ipv4 dhcp-client enable
 ipv4 dhcp-client early
 ipv6 address dynamic ffff:ffff:ffff:ffff::
 ipv6 gateway-prefix p6
 ipv6 slaac
 no shutdown
 no log-link-change
 exit
!
!
!
!
!
!
!
!
!
!
!
!
!
!
server telnet tel
 security protocol telnet
 no exec authorization
 no login authentication
 vrf v1
 exit
!
!
end

freerouter# 
freeRouter launch with supplied freerouter-hw.txt and freerouter-sw.txt with a console prompt
╭─[6:06:13]floui@debian[3]  ~/freeRouter  
╰─➤  java -jar lib/rtr.jar routersc etc/freerouter-hw.txt etc/freerouter-sw.txt                                                                                      3 ↵
info cfg.cfgInit.doInit:cfgInit.java:556 booting
info cfg.cfgInit.doInit:cfgInit.java:680 initializing hardware
info cfg.cfgInit.doInit:cfgInit.java:687 applying defaults
info cfg.cfgInit.doInit:cfgInit.java:695 applying configuration
info cfg.cfgInit.doInit:cfgInit.java:721 done
welcome
line ready
freerouter#                   
Launch pcapInt in order to bind socket localhost:26011 to localhost26021@enp0s9
╭─[6:06:13]floui@debian[1]  ~/freeRouter/bin  
╰─➤  sudo ./pcapInt.bin enp0s9 26021 127.0.0.1 26011 127.0.0.1                                                                                                       1 ↵
binded to local port 127.0.0.1 26021.
will send to 127.0.0.1 26011.
pcap version: libpcap version 1.8.1
opening interface enp0s9 with pcap1.x api
serving others
> 

Verification

FreeRouter telnet access from Virtualbox VM guest via port 2323
╭─[7:07:41]floui@debian[1]  ~/freeRouter/etc  
╰─➤  telnet localhost 2323                                                                                                                                           1 ↵
Trying ::1...
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
welcome
line ready
freerouter#
freerouter running configuration
freerouter#sh run                                                              
hostname freerouter
buggy
!
!
prefix-list p4
 sequence 10 permit 0.0.0.0/0 ge 0 le 0
 exit
!
prefix-list p6
 sequence 10 permit ::/0 ge 0 le 0
 exit
!
vrf definition v1
 exit
!
interface ethernet1
 description freerouter@enp0s9
 vrf forwarding v1
 ipv4 address dynamic 255.255.255.0
 ipv4 gateway-prefix p4
 ipv4 dhcp-client enable
 ipv4 dhcp-client early
 ipv6 address dynamic ffff:ffff:ffff:ffff::
 ipv6 gateway-prefix p6
 ipv6 slaac
 no shutdown
 no log-link-change
 exit
!
!
!
!
!
!
!
!
!
!
!
!
!
!
server telnet tel
 security protocol telnet
 no exec authorization
 no login authentication
 vrf v1
 exit
!
!
end

freerouter#         
Ping IPv4 from freerouter -> LAN router gateway
freerouter#ping 192.168.1.1 /vrf v1                                            
pinging 192.168.1.1, src=null, cnt=5, len=64, tim=1000, ttl=255, tos=0, sweep=false
!!!!!
result=100%, recv/sent/lost=5/5/0, rtt min/avg/max/total=1/1/1/5
freerouter#                                                                                                               
IPv4 arp check ( 192.168.1.1 is the gateway, 192.168.1.143 is the VM host)
freerouter#sh ipv4 arp eth1                                                    
mac             address        time      static
34ce.0067.18c2  192.168.1.1    00:00:43  false
9ceb.e8d5.2c51  192.168.1.143  00:00:43  false

freerouter#                                                                    

Ping IPv6 from freerouter -> LAN router gateway
freerouter#ping fd7d:a59c:650b::1 /vrf v1                                      
pinging fd7d:a59c:650b::1, src=null, cnt=5, len=64, tim=1000, ttl=255, tos=0, sweep=false
!!!!!
result=100%, recv/sent/lost=5/5/0, rtt min/avg/max/total=0/0/2/4
freerouter#                                                                                                           
IPv6 neighbor discovery check ( 192.168.1.1 is the gateway, 192.168.1.143 is the VM host)
freerouter#show ipv6 neighbors eth1                                            
mac             address                              time      static  router
34ce.0067.18c2  fd7d:a59c:650b::1                    00:01:44  false   false
9ceb.e8d5.2c51  fd7d:a59c:650b::8926:98c9:bbde:8ed7  00:01:44  false   false

freerouter#
Initiate IPv4 ssh from freerouter -> LAN router gateway
freerouter#ssh 192.168.1.1 /vrf v1 /user root                                  
 - connecting to 192.168.1.1 22
password: ***************
                
 - securing connection



BusyBox v1.28.4 () built-in shell (ash)

  _______                     ________        __
 |       |.-----.-----.-----.|  |  |  |.----.|  |_
 |   -   ||  _  |  -__|     ||  |  |  ||   _||   _|
 |_______||   __|_____|__|__||________||__|  |____|
          |__| W I R E L E S S   F R E E D O M
 -----------------------------------------------------
 OpenWrt 18.06.2, r7676-cddd7b4c77
 -----------------------------------------------------
root@OpenWrt:~#


Initiate IPv6 ssh from freerouter -> LAN router gateway
freerouter#ssh fd7d:a59c:650b::1 /vrf v1 /user root                            
 - connecting to fd7d:a59c:650b::1 22
password: ***************
                
 - securing connection



BusyBox v1.28.4 () built-in shell (ash)

  _______                     ________        __
 |       |.-----.-----.-----.|  |  |  |.----.|  |_
 |   -   ||  _  |  -__|     ||  |  |  ||   _||   _|
 |_______||   __|_____|__|__||________||__|  |____|
          |__| W I R E L E S S   F R E E D O M
 -----------------------------------------------------
 OpenWrt 18.06.2, r7676-cddd7b4c77
 -----------------------------------------------------
root@OpenWrt:~

Conclusion

In this article you:

  • had a demonstration of how to integrate freeRouter to a local area network
  • learn how to configure an interface in order to act as an IPv4 DCHP client 
  • learn how to configure an interface using IPv6 SLAAC 

[ #002 ] RARE/FreeRouter-101 - key take-away

  • FreeRouter is using UNIX socket in order to forward packet.

You can use pcapInt binary from freeRouter net-tools that will bind freeRouter socket (locathost:26011) to a physical network interface (localhost:26021@enp0s9) 

It support a huge list of feature with IPv4/IPv6 parity. In this example we demonstrated how an interface can inherit IPv4/IPv6 addresses from IPv4 DHCP server or IPv6 SLAAC

  • freeRouter can interact with the real network (in various flavors. We will develop this in further articles)

It can be used as a BGP route reflector in Internet Service Provider environment, as ROA/RPKI validator, BMP server, BGP looking glass, route server etc.

The main objective of [RARE / FreeRouter 101] series is to help you getting started with FreeRouter from scratch without any prior knowledge.

Requirement

  • Basic Linux/Unix knowledge
  • Basic networking knowledge

Overview

freeRouter is a free, open source router control plane software. For nostalgic and networkers from prehistoric era (like me), freeRouter besides Ethernet, is able to handle HDLC, X25, frame-relay, ATM encapsulation. Since it handles packets itself at the socket layer, it is independent of underlying Operation System capabilities. We will see in the next articles how freeRouter subtlety leverage this inherently independence to connect different data-plane such as OpenFlow, P4 and other possible data-plane that would appear in the near future.

The command line tries to mimic the industry standards with one exception:

  • no global routing table: every routed interface must be in a virtual routing table
  • positive side effect: there are no vrf-awareness questions

Article objective

This article is meant to simply deploy 2 instances of freeRouter on the same fresh linux installed linux box. We are voluntary using freeRouter (freerouter.nop.hu) "raw" official repository in order to get familiar with the deployment manual process. Even if the deployment process is straightforward, it is not self explanatory for people non familiar with java/linux.

In order to simplify the deployment we have automated freeRouter daily builds on:

But let's get our "hand dirty" and follow the simple manual installation. 

Diagram

[ #001 ] - Cookbook

In our example we will use a genuine debian stable image (buster) installed as a VirtualBox VM.

  • Start & connect your VM as root 
  • Update your VM
apt-get update
apt-get upgrade

In this example, we won't recompile freeRouter so installing headless java runtime is enough. This set up is recommended for production environment in order to ensure minimal software footprint

apt-get install default-jre-headless --no-install-recommends

Let's create the following structure, even if some folder are empty for now:

mkdir -p ~/freeRouter/bin ~/freeRouter/lib ~/freeRouter/etc ~/freeRouter/log
cd ~/freeRouter/lib
wget http://freerouter.nop.hu/rtr.jar

so you have have the following environment:

╭─[11:11:54]floui@debian ~ 
╰─➤ tree freeRouter
freeRouter
├── bin   # binary files      
├── etc   # configuration files      
├── lib   # library files      
└── log   # log files      

FreeRouter uses 2 configuration files in order to run, let's write these configuration files for R1 in ~/freeRouter/etc

freeRouter hardware file: r1-hw.txt
int eth1 eth 0000.1111.0001 127.0.0.1 26011 127.0.0.1 26021
tcp2vrf 1123 v1 23
freeRouter software configuration file: r1-sw.txt
hostname r1
!
vrf definition v1
 exit
!
int eth1
desc r1@e1 -> r2@e1
vrf forwarding v1
 ipv4 address 1.1.1.1 255.255.255.252
 ipv6 address 1234::1 ffff:ffff:ffff:ffff::
 exit
!
server telnet tel
 security protocol telnet
 no exec authorization
 no login authentication
 vrf v1
 exit
!

Repeat the same configuration for R2 in ~/freeRouter/etc

freeRouter hardware file: r2-hw.txt
int eth1 eth 0000.2222.0001 127.0.0.1 26021 127.0.0.1 26011
tcp2vrf 2223 v1 23
freeRouter software configuration file: r2-sw.txt
hostname r2
!
vrf definition v1
 exit
!
int eth1
desc r2@e1 -> r1@e1
vrf forwarding v1
 ipv4 address 1.1.1.2 255.255.255.252
 ipv6 address 1234::2 ffff:ffff:ffff:ffff::
 exit
!
server telnet tel
 security protocol telnet
 no exec authorization
 no login authentication
 vrf v1
 exit
!
freeRrouter launch with blank parameters
╭─[12:58:45]floui@debian ~/freeRouter  
╰─➤  java -jar ./lib/rtr.jar 
java -jar ./lib/rtr.jar <parameters>
parameters:
  router <cfg>            - start router background
  routerc <cfg>           - start router with console
  routerw <cfg>           - start router with window
  routercw <cfg>          - start router with console and window
  routers <hwcfg> <swcfg> - start router from separate configs
  routera <swcfg>         - start router with sw config
  test <cmd>              - execute test command
  show <cmd>              - execute show command
  exec <cmd>              - execute exec command
R1 launch with supplied r1-hw.txt and r1-sw.txt with a console prompt
╭─[12:59:11]floui@debian ~/freeRouter  
╰─➤  java -jar lib/rtr.jar routersc etc/r1-hw.txt etc/r1-sw.txt 
info cfg.cfgInit.doInit:cfgInit.java:556 booting
info cfg.cfgInit.doInit:cfgInit.java:680 initializing hardware
info cfg.cfgInit.doInit:cfgInit.java:687 applying defaults
info cfg.cfgInit.doInit:cfgInit.java:695 applying configuration
info cfg.cfgInit.doInit:cfgInit.java:721 done
welcome
line ready
r1#                   
R2 launch with supplied r2-hw.txt and r2-sw.txt with a console prompt
╭─[12:58:52]floui@debian ~/freeRouter  
╰─➤  java -jar lib/rtr.jar routersc etc/r2-hw.txt etc/r2-sw.txt
info cfg.cfgInit.doInit:cfgInit.java:556 booting
info cfg.cfgInit.doInit:cfgInit.java:680 initializing hardware
info cfg.cfgInit.doInit:cfgInit.java:687 applying defaults
info cfg.cfgInit.doInit:cfgInit.java:695 applying configuration
info cfg.cfgInit.doInit:cfgInit.java:721 done
welcome
line ready
r2#                   

Verification

R1 telnet access from Virtualbox VM guest via port 1123
╭─[1:09:28]floui@debian ~  
╰─➤  telnet localhost 1123
Trying ::1...
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
welcome
line ready
r1#                   
R2 telnet access from Virtualbox VM guest via port 2223
╭─[1:15:37]floui@debian ~  
╰─➤  telnet localhost 2223                                                                                                                                           1 ↵
Trying ::1...
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
welcome
line ready
r2#                  
R1 running configuration
r1#sh run                                                                      
hostname r1
buggy
!
!
vrf definition v1
 exit
!
interface ethernet1
 description r1@e1 -> r2@e1
 vrf forwarding v1
 ipv4 address 1.1.1.1 255.255.255.252
 ipv6 address 1234::1 ffff:ffff:ffff:ffff::
 no shutdown
 no log-link-change
 exit
!
!
!
!
!
!
!
!
!
!                  
!
!
!
!
server telnet tel
 security protocol telnet
 no exec authorization
 no login authentication
 vrf v1
 exit
!
!
end

r1#                 
R2 running configuration
r2#sh run                                                                      
hostname r2
buggy
!
!
vrf definition v1
 exit
!
interface ethernet1
 description r2@e1 -> r1@e1
 vrf forwarding v1
 ipv4 address 1.1.1.2 255.255.255.252
 ipv6 address 1234::2 ffff:ffff:ffff:ffff::
 no shutdown
 no log-link-change
 exit
!
!
!
!
!
!
!
!
!
!                  
!
!
!
!
server telnet tel
 security protocol telnet
 no exec authorization
 no login authentication
 vrf v1
 exit
!
!
end

r2#                  
Ping from R1 -> R2
r1#ping 1.1.1.2 /vrf v1                                                        
pinging 1.1.1.2, src=null, cnt=5, len=64, tim=1000, ttl=255, tos=0, sweep=false
!!!!!
result=100%, recv/sent/lost=5/5/0, rtt min/avg/max/total=1/2/3/13
r1#
r1#ping 1234::2 /vrf v1                                                        
pinging 1234::2, src=null, cnt=5, len=64, tim=1000, ttl=255, tos=0, sweep=false
!!!!!
result=100%, recv/sent/lost=5/5/0, rtt min/avg/max/total=1/4/11/23
r1#                                                                                                      
Ping from R2 -> R1
r2#ping 1.1.1.1 /vrf v1                                                        
pinging 1.1.1.1, src=null, cnt=5, len=64, tim=1000, ttl=255, tos=0, sweep=false
!!!!!
result=100%, recv/sent/lost=5/5/0, rtt min/avg/max/total=0/1/2/12
r2#    
r2#ping 1234::1 /vrf v1                                                        
pinging 1234::1, src=null, cnt=5, len=64, tim=1000, ttl=255, tos=0, sweep=false
!!!!!
result=100%, recv/sent/lost=5/5/0, rtt min/avg/max/total=0/1/3/7
r2#                                                                     
Initiate IPv4 telnet from R1 -> R2 (inside freeRouter scope)
r1#telnet 1.1.1.2 23 /vrf v1                                                   
 - connecting to 1.1.1.2 23

welcome
line ready
r2#                                                                         
Initiate IPv6 telnet from R2 -> R1 (inside freeRouter scope)
r2#telnet 1234::1 /vrf v1                                                      
 - connecting to 1234::1 23

welcome
line ready
r1#                                                                      

Conclusion

In this article you:

  • had a brief introduction of freeRouter networking Swiss army knife
  • learn how to deploy 2 instances of freeRouter and interconnect them via 2 UNIX sockets on a VM guest running on VirtualBox 
  • this setup is ideal, for network simulation encompassing hundreds of nodes, self contained in the same VM without interaction with the external world. (protocol experimentation, convergence test etc.)

[ #001 ] RARE/FreeRouter-101 - key take-away

  • FreeRouter is using UNIX socket in order to forward packet.

This is a key feature that will be leveraged to connect freeRouter control plane to any type of data-plane

  • In FreeRouter everything is in a VRF (so there is no global VRF)

This design choice has very positive consequences like: No VRF awareness questions,have multiple bgp processes for the same freeRouter instance (each bound to a different VRF) 

All the feature set is IPv4 and IPv6 compliant. So there is no compromised !



Hi Csaba, Thanks for being with us today

Hi Csaba,

Hi Csaba, Thanks for being with us today Thanks for being with us today