...
When we initially built the infrastructure, it was still homogeneous—all CPUs were Sandy Bridge. We configured Nova/libvirt so that the native features of the Hypervisor host were exposed to instances. This had the advantage that VMs were able to use all of the functionality (e.g. new vector or crypto instructions) that the CPUs provided.
Since Because we store VM's block devices (virtual disks) are all stored in a shared Ceph cluster via OpenStack's RBD (RADOS Block Device) mapping, we have always been able to migrate instances from one hypervisor to another without stopping them ("live migration"). This is extremely useful for doing maintenance on the hypervisors, e.g. when security patches need to be applied to the base OS or to the virtual machine manager (QEMU/KVM).
...
Eventually we decided to solve this issue by dumbing down all instances to the lowest common feature set, i.e. Sandy Bridge in our ZH region. Now new instances will only be able to use the subset of features supported by Sandy Bridge CPUs, but we can freely migrate them between all servers.
The See the KVM-specific part of the OpenStack Nova configuration guide (Kilo version). Our configuration uses cpu_mode=custom
in nova.conf. When cpu_mode is set to custom, then cpu_model
must also be defined. On our ZH cluster, we set it to SandyBridge
. The model must be known to libvirt; these models are defined in /usr/share/libvirt/cpu_map.xml
. On the somewhat ancient libvirt version we use, this file knows about SandyBridge and Haswell, but not about IvyBridge or Broadwell. In our other production cluster, LS, we could use IvyBridge
, but then we'd have to define that ourselves. But we find it too cumbersome/risky to override this system file, so we just use SandyBridge as well. Anyway, there aren't that many differences between SandyBridge and IvyBridge.
Note: This configuration may be accepted in Juno, but live migration between different hypervisor types only works on Kilo and up, because of bug #1082414, which was fixed for Kilo.
Instances that were created before we made that change to nova.conf still use all native features; therefore we must be careful when migrating those. We had hoped that they will die out over time, but people tend to leave their VMs up for a very long time, so we'll probably just live with them. Maybe one day we'll find a way to "cheat" Nova/libvirt into making them more easily migratable. This can probably be done by hacking the (XML) definitions of their corresponding libvirt domains somehow. The trick is to do this without having to reboot the VMs... has anyone got an idea on how this could be done? If so, please comment!