Overlay Networks in the Data Center
In data centers, overlay networks provide traditional "Layer-2" (Ethernet) semantics on an underlying fabric that uses "Layer-3" (IP) forwarding. Layer-2 traffic is encapsulated in IP using some tunneling mechanism such as Generic Routing Encapsulation (GRE), Virtual eXtended LAN (VXLAN), Stateless Transport Tunneling (STT), Generic Network Virtualization Encapsulation (Geneve) or similar.
Benefits of Overlay Networks
Using Layer-3 forwarding in the fabric (substrate), overlay networks avoid certain scalability issues associated with Layer-2 networks. IP routing protocols such as OSPF have proven to scale to thousands of forwarding devices. This scale is sufficient to support networks that span even the largest feasible data centers.
Another dimension of scalability is the number of isolated (virtual) Layer-2 networks. Traditional Layer-2 approaches have used VLANs (Virtual LANs) for this. But the VLAN tag in standard IEEE 802.1Q Ethernet supports only 4096 logical networks, which is insufficient for applications with many tenants, such as public IaaS.
Because the fabric/substrate only needs simple IP routing with a moderately sized routing table, it can be built from simple components both from a hardware and from a software point of view. For example, simple ToR switches can be combined to form large data center networks in a leaf-spine architecture.
If encapsulation is performed on servers at the edge, the mapping of virtual L2 networks can be defined arbitrarily. This can be used to support many different use cases, such as user-provisioned virtual networks, composition of virtualized network functions in NFV, centralized SDN controllers etc.
Limitations and Drawbacks
Edge devices have to perform encapsulation and decapsulation of L2 payloads into and from L3 fabric-traversing packets. This incurs various types of overhead:
Processing Overhead for Encapsulation and Decapsulation/Demultiplexing
Outgoing L2 frames have to be decorated with encapsulation headers for transport. In some cases, the encapsulating header contains a checksum field that has to be computed over the entire packet. In some cases the existing "inner" checksum can be used as a basis for the outer checksum to avoid much of the overhead.
Incoming packets need to be mapped to the correct virtual L2 domain by looking at encapsulation headers. The headers need to be stripped from the packet to yield the L2 frame, which then must be passed to the receiver.
Various chipsets, both for switches and for network adapters, include hardware support for some encapsulations. Support for GRE seems to be widespread, while support for VXLAN is only just emerging as of 2014. In any case, the necessary software support to make use of these features usually lags some time behind the availability of the respective hardware accelerations.
In practice, encap/decap is often performed in the kernel on (hypervisor) hosts. In the GNU/Linux ecosystem, Open vSwitch has emerged as a popular replacement of the L2 forwarding code in the kernel. As of 2014, various performance enhancements related to tunnel processing are being integrated into the Linux kernel and Open vSwitch.
Weight of Encapsulation Headers
The headers used for encapsulation make the transported packets bigger compared to a network without overlays. On one side this means that more bits have to be transported, which reduces the overall available capacity - although an argument can be made that the simplification of the fabric more than compensates for this by allowing bigger and faster networks to be built for the same cost. On the other, it can lead to the MTU issues described below.
Because of encapsulation headers, the maximum size of L2 frames that can be transported by the overhead is strictly lower than the maximum size of L3 packets that the fabric can transport. If the fabric is built as configured based on standard Ethernet links, it will be capable of transporting 1500-byte IP packets. Depending on encapsulation, this means that maybe 1450-1480 bytes are available for the L2 frames to be transported. There are three ways around this:
- Make the inner MTU smaller, so that encapsulated packets fit the standard MTU. An MTU of 1300 or 1400 or 1450 bytes is still reasonably efficient.
- Make the outer MTU bigger, so that standard maximum-size L2 frames can be forwarded even after encapsulation.
- Fragment encapsulated packets if they become too large for the fabric to transport. This is unpopular because of the overhead of fragmentation and reassembly. (See RFC 4459 on MTU and Fragmentation Issues with In-the-Network Tunneling.)
Emulating L2 networks in overlays is not completely trivial. For example, broadcast (and possibly) multicast has to be supported, usually through something like a "broadcast server". Address learning can be avoided in some applications, where the set of MAC addresses connected to virtual L2 ports is known in advance.
– Main.SimonLeinen - 2014-12-30 - 2014-12-31