In SWITCHengines we use a single type of hypervisor software (QEMU/KVM).  But we have multiple generations of nova-compute hosts: Initially we built our two production clusters with Ivy Bridge (Intel Xeon E5-2600v2) CPUs.  Then when we ran out of room in one of the clusters (ZH) we converted older servers from a trial to compute nodes; those used Sandy Bridge (E5-2600) CPUs.  Then we added new machines to the ZH cluster based on Broadwell-EP (E5-2600v4) CPUs (To make things more complicated, these servers were initially delivered with Haswell-EP = E5-2600v3 CPUs, but we're finally replacing them with the newer processors that had been promised to us.)

When we initially built the infrastructure, it was homogeneous—all CPUs were Sandy Bridge.  We configured Nova/libvirt so that the native features of the Hypervisor host were exposed to instances.  This had the advantage that VMs were able to use all of the functionality (e.g. new vector or crypto instructions) that the CPUs provided.

Because we store VM's block devices (virtual disks) in a shared Ceph cluster via OpenStack's RBD (RADOS Block Device) mapping, we have always been able to migrate instances from one hypervisor to another without stopping them ("live migration").  This is extremely useful for doing maintenance on the hypervisors, e.g. when security patches need to be applied to the base OS or to the virtual machine manager (QEMU/KVM).

When we added the machines with older CPUs, we found that we could migrate hosts within the new-CPU sub-cluster, within the old-CPU sub-cluster, and from an old-CPU hypervisor to a new-CPU one.  This means that once an instance has been migrated to a newer CPU, it could never move back to an older one.  Also, when using nova live-migration <UUID> without specifying a destination host, the scheduler would often choose a destination host that would turn out to be incompatible, because it used an older CPU.  This made maintenance quite hard.

Eventually we decided to solve this issue by dumbing down all instances to the lowest common feature set, i.e. Sandy Bridge in our ZH region.  Now new instances will only be able to use the subset of features supported by Sandy Bridge CPUs, but we can freely migrate them between all servers.

See the KVM-specific part of the OpenStack Nova configuration guide (Kilo version).  Our configuration uses cpu_mode=custom in nova.conf.  When cpu_mode is set to custom, then cpu_model must also be defined.  On our ZH cluster, we set it to SandyBridge.  The model must be known to libvirt; these models are defined in /usr/share/libvirt/cpu_map.xml.  On the somewhat ancient libvirt version we use, this file knows about SandyBridge and Haswell, but not about IvyBridge or Broadwell.  In our other production cluster, LS, we could use IvyBridge, but then we'd have to define that ourselves.  But we find it too cumbersome/risky to override this system file, so we just use SandyBridge as well.  Anyway, there aren't that many differences between SandyBridge and IvyBridge.

Note: This configuration may be accepted in Juno, but live migration between different hypervisor types only works on Kilo and up, because of bug #1082414, which was fixed for Kilo.

Instances that were created before we made that change to nova.conf still use all native features; therefore we must be careful when migrating those.  We had hoped that they will die out over time, but people tend to leave their VMs up for a very long time, so we'll probably just live with them.  Maybe one day we'll find a way to "cheat" Nova/libvirt into making them more easily migratable.  This can probably be done by hacking the (XML) definitions of their corresponding libvirt domains somehow.  The trick is to do this without having to reboot the VMs... has anyone got an idea on how this could be done? If so, please comment!

  • No labels