Skip to end of metadata
Go to start of metadata


This is a special blog series called "RARE software architecture". As its name implies, it deals with topics related to RARE/freeRouter software design choice.

Requirement

  • Basic Linux/Unix knowledge
  • Service provider networking knowledge

Overview

RARE project objective is to provide a routing platform proposing various solutions addressing multiple use cases in the R&E landscape. In the picture below you see in purple the different use cases:

As you can notice, each use case will run on different hardware that potentially can have different dataplanes. As we were starting from a clean slate environment without much choice, especially with P4 programmability - the first dataplane or P4 target considered was BMv2. BMv2 is an excellent way to learn P4, it is also the first target we use in order to program and validate new features. After 6 months of practising our "P4-fu" we developed:

  • a P4lang repository for ubuntu bionic and focal
  • a debian 10 repository
  • had our first RARE/FreeRouter prototype powered by a P4 BMv2 dataplane !

Our initial work, considering FreeRouter's Java nature, was to write a Java P4Runtime GRPC client that would be able to program the entries in the tables exposed by BMv2 via the P4Info file. However, this would have intimately tied FreeRouter code to P4Runtime gRPC code. Even if it's more natural to choose this solution, going in that direction implied that dataplanes other than BMv2 would be compliant to P4Runtime. It turns out that this is not the case. We then opted for a simple message API via a bi-directional raw UNIX socket. We will see what this means later in this blog.

Motivated by the successful experience with BMv2, we then decided to move forward and started to study TOFINO as a target. We were greedy and eager to apply our P4 code against multi-terabits traffic. After a few P4 program compilations, the first impression from my personal perspective was ... mind blowing ! INTEL/BAREFOOT TOFINO effectively opened the door to multi-terabits packet processing... Just to have at the tip of your finger the possibility to process traffic at these traffic levels was exciting !

As a side note, the journey was not without suffering and pain... (smile) We had to port our BMv2 code - and to port to TOFINO was not "Une lettre à la poste"... It is not that TOFINO programming is gratuitously painful. It is just that it is p4c-tofino's job to make sure that our packets are processed at silicon lighting speed. Imagine you are asked to  convey parcels by driving from Paris to Amsterdam with a car that has an infinitely sized trunk, with an infinite gas tank and no particular speed constraint along the road. And then you are asked to do the same trip, but with an actual real car that has a trunk with a fixed size and with a 50 litre gas tank, and of course you'll have to follow speed signs along the road.

In the first case, you would put as many parcels as you would like and you even won't bother looking at your gas tank level and maybe you'd set the speed to 200 Km/h. The second case forces you to carefully think about how many parcels you can put in your trunk, check to see if one completely full tank can be sufficient for the trip and of course, you would have to follow the speed signs.

If you allow me this comparison, this is where BMv2 and TOFINO programming differs.  

But, this pain was not in vain, it was for the greater good... You can't imagine the inherent joy when you see the TOFINO compiler displaying the DONE word ! For the veterans who can remember, it is the same feeling when you manage to compile your first program in the ADA language. The compiler is not so strict that compiling an ADA program is in itself a feat. No wonder why this language is used in Spatial rocket (Ariane).

Back to our dataplane interface story, even TOFINO and BMv2 share some roots, while BMv2 had P4Runtime as a northnound interface, INTEL/BAREFOOT pushed into TOFINO platform with P4_16 their gRPC interface counterpart: BfRuntime.

Our best bet paid off as FreeRouter message API was unchanged and without much effort we could add a new dataplane "wingman" to the FreeRouter control plane.

To recap:

  • For BMv2: Our interface yields P4Runtime RPC calls. This program is called: forwarder.py
  • For TOFINO: Our interface yields BfRuntime RPC calls. This program is called witout too much originality: bf_forwader.py

At that point we were starting to have a decent LSR/LER router for CORE and Aggregation use cases.

But we still had nothing at the EDGE/AGGREGATION layer in terms of a solution proposal, deploying P4 hardware might be way too expensive in specific contexts such as small R&E institutions like primary schools or small R&E labs. To that purpose, we started to study new targets such as VMWARE XDP and a very promising project: T4P4S ELTE. While we could not use XDP without a lot of P4 code rewriting and compromise, T4P4S ELTE was from our perpective very promising. But due to a compilation issue, we could not move forward.

FPGA was also a solution that we considered but had no access to any FPGA hardware that was P4 compliant.

As a result, we were a little bit bitter and started to read the DPDK library. And we started to play with DPDK examples... These examples were tremendously useful as it sparked some DPDK development into the RARE team. Csaba, the FreeRouter lead developer, step by step came up with this GENIUS idea: why don't we just use emulate P4 RARE P4 dataplane program ? We can still revert to using T4P4S ELTE when it will be ready ?

P4emu/P4dpdk was then born ! 

To conclude this short story, RARE/FreeRouter has now 3 completely different dataplanes: (in order of appearance)

  • BMv2
  • TOFINO
  • DPDK


Unique RARE/FreeRouter feature

However, please note that FreeRouter message API is common to the three dataplanes listed above. You'll see further how this structure make the solution: an open modular, interchangeable solution.

Article objective

In this article, let's present RARE/FreeRouter platform structure and focus on the interface(S) between FreeRouter control plane and various dataplane.

Diagram

[ #001 ] - Modular design

 FreeRouter control plane

In this designs, FreeRouter is focusing on running control plane processes, such as routing protocols IGP(s), BGP(s). There are other control plane processes but let's just focus on these latter. At some point in time, all IGPs/EGP converge and will have to create an entry in a routing table. In case of IPv4 the entry will be created into an IPv4 forwarding table and similarly, an IPv6 route entry will be created into IPv6 forwarding table. From FreeRouter point of view these entry creation will be triggered by yielding one Java function twice that will generate these 2 API messages, one for IPv4 and the other one for IPv6.

 Common message API

Let's add an IPv4 route into freeRouter CLI

route addition via freeRouter
conf t
ipv4 route v1 1.2.3.0 255.255.255.0 4.4.4.4
...

Upon entering the ipv4 route and pressing <enter>, you'll see the following message appearing

message API: route4_add
...
rx: ['route4_add', '1.2.3.0/24', '13063', '4.4.4.4', '1', '\n']
...

Let's delete the route via FreeRouter CLI

route deletion via freeRouter
conf t
no ipv4 route v1 1.2.3.0 255.255.255.0 4.4.4.4
...
message API: route4_del
...
rx: ['route4_del', '1.2.3.0/24', '13063', '4.4.4.4', '1', '\n']
...

Important note

In short, the message API is simply a collection of message that would trigger an entry ADD/DELETE/MODIFY into the dataplane corresponding table.

The documentation of this message API will be documented and published soon, but for those who are curious and can't wait this documentation, you can read forwarder.py, bf_forwarder.py or p4dpdk.bin  source code

 Candidate dataplane platform

As said in the beginning of the article, freeRouter control plane would have to deal with dataplane of different nature. And we concluded in mentioning that for now, freeRouter has three dataplanes. Each of these dataplanes have their own northbound interface, whether this is P4Runtime for BMv2, BfRuntime for TOFINO or P4DPDK for system compatible with DPDK and having DPDK complinnt NIC.

For BMv2 we just had to write an interface that would translate freeRouter API message into P4Runtime GRPC calls. For BMv2 this interface is called forwarder.py:

For TOFINO we just had to write an interface that would translate freeRouter API message into BfRuntime GRPC calls. For TOFINO this interface is called bf_forwarder.py:

For DPDK we just had to write an interface that would translate freeRouter API message into DPDK primitives. This interface is included into DPDK dataplane bundled into freeRouter binaries: p4dpdk.bin

It is just as simple as that !

Discussion

 Dataplane addition made easy

This design is pretty unique because, if for any reason you would like to "hook" freeRouter control plane to an other dataplane such as:

  • FPGA
  • or dataplane powered by kernel bypass technique such as RDMA
  • Or other NPU based dataplane
  • etc.

This is possible !

You would "just" have to port your P4 code logic into the target dataplane and create an interface able to translate API messages from FreeRouter into understandable message from the target dataplane.

Be cautious with the word "just"

The "just" word can be misleading. Indeed, depending on the target dataplane, it can be a huge task. With DPDK, we were lucky in getting enough material in order to move forward and again p4dpdk.bin was a simple trial at the very beginning. But some other dataplane can just be simply be ignored if we don't get enough material/support from NPU vendors. 

 You can use your own control plane too !

One thing that we did not experience, but this can be maybe one day a reality.

What if you have your own control plane and that you absolutely want to keep it, but would like to re-use BMv2/TOFINO or DPDK RARE dataplane ?

Well this is possible !

Long time ago I met Thomas MANGIN (yet another cool and nice French guy (smile) ) which is the author of Exa-BGP, i did not talk to him about this and I don't want to give him bad idea, but what if he would like to hook a TOFINO P4 dataplane to Exa-BGP ?

Well, he actually would just have to teach exaBGP to handle entry ADD/DELETE/MODIFY message according to the message API above.

I also love the work DONE at the SoNIC project level and I know that SoNIC has already a P4 dataplane called switch.p4. I doubt it will be the case one day but, what if SoNIC project wanted to re-use RARE dataplane for especially for Service Provider capability ?

OK, this sounds crazy, but the modular design we proposed here is valid and can make the RARE dataplane available for other control plane.

Of course, we strongly suggest you to stick with FreeRouter as you will just realize IMHO that in the TELCO Service Provider space there is no match. You'll have the venerable IOS-XR and JUNOS, but these are not Open Source counterparts.


Conclusion

In this 1st article you:

  • had a 10K feet view description of RARE/FreeRouter modular design
  • This design allow rapid dataplane addtion without altering whatsoever FreeRouter code base
  • In case you would like to re-use BMv2/TOFINO/P4DPDK dataplane, this has been never implemented but this is possible !

Message API documentation

From the time being this API  message is not yet publicly documented. However, it is available and buried inside forwarder.py or bf_forwarder.py source code. This is work in progress but if you feel an urgent need to use it feel free to read the code.

PS: We will publish this document ASAP, but time plays against us ...




  • No labels