I recently had to work with a relatively large OpenStack deployment running the older Folsom release. I needed to move over thirty instances to enhance the performance and stability of the system. In this deployment, local disks store the instance disk files on each compute server instead of on shared network storage. While Folsom (via KVM) provides a live-migration feature, the older versions of the underlying technologies like qemu and libvirt are full of bugs, and the live-migration process regularly failed for me. I decided that using it was not reliable enough to move these thirty-odd instances – the whole cluster needed moving within a certain time window, so there was no room for glitches!
Searching the net for help, I came up dry. It seems there aren’t many people who have had to do this, or if they did they weren’t writing about it. Seemed that my own wits would have to serve me! After much experimentation, I devised a manual process to move the instances. It’s based in part on the OpenStack documentation describing recovery from hardware failures. That, and extensive testing to determine the other fiddly bits required to make it work.
Here is the process to manually move an instance in OpenStack. You may need to modify the details, depending on where things are stored and how your OpenStack deployment is configured.
- Stop services on the instance.
- Power off the instance from the command line: sudo poweroff now
- Wait a little bit to allow the instance to fully power down, flush disks, etc.
- Tell OpenStack to stop the instance:
nova stop vm_name
This gets the VM state updated in OpenStack
- Identify the instance’s internal name:
nova show vm_name
You are looking for something like instance-000004f2 where the hex number is the instance ID in the nova database.
- Verify the instance is actually stopped:
virsh list --all
- Check the size of the instance files in /var/lib/nova/instances/instance-000004f2
- Verify the destination host has sufficient disk space to receive the instance files:
- Copy the instance disk files using rsync:
rsync -Pa /var/lib/nova/instances/instance-000004f2 destination_hostname:/var/lib/nova/instances/
- Update the libvirt.xml file in the instance directory to reflect the new IP address for the DHCP server. This assumes you are using FlatDHCPManager with Nova Network, and may be different in your setup. Check the IP address on the destination compute host and update the file. Compare to the old host if you aren’t sure which interface to use.
<parameter name="DHCPSERVER" value="10.1.2.100"/>
- Go into the nova database using mysql and edit the instance’s host to reflect the destination hostname. You can do this by ID or whatever other selector you have handy.
UPDATE instances SET host="destination_hostname" WHERE id=0x000004f2;
- On both the old and new hosts, restart dnsmasq and nova-network:
sudo killall dnsmasq
sudo service nova-network restart
This gets the IP addressing straightened out, reworks iptables, etc.
- Restart the instance:
nova reboot --hard vm_name
- Once you’ve verified the instance is up and running, and is accessible over the network, you can delete the instance files in /var/lib/nova/instances/instance-000004f2 – just be careful!
Disclaimer: While this process worked for me, OpenStack can be configured in so many different ways that you may need to do things a bit differently. If you aren’t comfortable with it, and don’t know what those commands do, do not do it! There may also be some kind of internal state that is not fixed up properly, which may result in strange behaviour or other problems. These steps also don’t account for volumes attached to the instance, or other possible things. Test it before you do it on some production system, and come up with your own verified procedure! I take no responsibility if you break things.
There you have it. It worked for me, YMMV. This is brain surgery as far as OpenStack is concerned, bypassing all the APIs and mechanisms that keep things orderly under the hood. Be careful, and good luck!