Ganeti is a virtual machine cluster management tool developed by Google. The solution stack uses either Xen or KVM as the virtualization platform, LVM for disk management, and optionally DRBD for disk replication across physical hosts.
Ganeti manages clusters of Xen or KVM host nodes. It controls and configures the high availability features such as instance migration, DRBD disk replication and host node failover.
You can use multiple clusters for different physical locations for example. All the nodes in the cluster expect to have very high bandwidth between them, for example for DRBD repication. If you have more than ~40 nodes in your cluster, Phil Regnauld recommends that you start a new cluster rather than adding more nodes.
We will install Ganeti in some virtual machines, configure it to use the Xen hypervisor, and use it to create and manage some virtual machines.
Normally you would install this on your physical hosts. We are using it in a VirtualBox virtual machine which is pretending to be our physical host, because we don’t have enough physical boxes for everyone. This forces us to use Xen (which is slower than KVM) because we can’t use KVM inside a VirtualBox virtual machine. You could use either for a real deployment. The installation process is slightly different. KVM is not covered here.
This is not a production setup! This is intended to give you hands-on experience of Ganeti and Xen on your own hardware (laptop) in a classroom environment. I will try to point out where we are taking shortcuts or unusual configurations that we would not normally do in production. NSRC has a five-day workshop on virtualisation and we have ~3 hours, so this is just an introduction!
Install VirtualBox or make sure you are running version 4.3 or higher.
Open VirtualBox Preferences > Network > Host-Only Adaptors. Ensure that you have at least two listed: vboxnet0 and vboxnet1. If not, click on the Add button to the right of the list to create them.
Double-click on vboxnet0 and check that the IP addresses are as follows:
And check that the DHCP server is enabled and configured for:
If you have made any changes, then exit and restart VirtualBox, otherwise this change will not take effect, as we discovered after an hour of debugging!
Create a new VM called Ganeti Demo. Give it 2 GB RAM and a 40 GB VDI disk, dynamically sized.
Start the VM and attach the debian-8.x.x-amd64-DVD-1.iso image. Read the following sections before you start the installation, and use them at the appropriate times during the installation.
You must use a fully qualified hostname, for example ganeti1.pcXX.sse.ws.afnog.org
.
If you have your own delegated DNS domain, and you know how to add A records to it,
then you can use it instead!
We will need a cluster address which is separate from each host node’s address. So I will use names like this in the example:
You can use multiple clusters for different physical locations for example. All the nodes in the cluster expect to have Layer 2 (shared LAN) between them (for live migration of instances to another node) and very high bandwidth (for DRBD replication).§ If you have more than ~40 nodes in your cluster, Phil Regnauld recommends that you start a new cluster rather than adding more nodes.
The server should use LVM for disk space, so instead of the default Guided Partitioning, choose Manual, then SCSI1. Write a new partition table if prompted.
max
, Primary,
Use as > Physical volume for LVM, Done.xenvg
. Select device /dev/sda1
.Root
and Size 8GB
.Swap
and Size 4GB
.If you are following this exercise at an AfNOG event, please enter this proxy server name when prompted, to save a LONG install time:
Please enter this carefully and check it. Using the wrong value will make it impossible for you to install any packages. Of course, if you are not at the AfNOG workshop then this server will no longer exist, so use a local proxy server or leave it blank.
While the installation proceeds, familiarise yourself with the terminology of Ganeti.
Enable installation of the SSH Server and Standard system utilities, disable everything else.
After installation, shut down the machine and reconfigure its network interfaces in VirtualBox:
Then start the machine again. Log in on the console, edit /etc/apt/sources.list
and delete the
deb cdrom
line. Then install some packages:
su
apt install bridge-utils sudo
usermod -G sudo afnog
Then edit /etc/network/interfaces
to look like this:
# The loopback network interface
auto lo
iface lo inet loopback
# The primary network interface
auto eth0
iface eth0 inet dhcp
auto xen-br0
iface xen-br0 inet static
# This must match the address of your hostname in the DNS, not the cluster address!
address 192.168.56.11
netmask 255.255.255.0
bridge_ports eth1
bridge_stp off
bridge_fd 0
Now you should be able to reboot the host and login with SSH using the address that you assigned to
xen-br0
above. SSH is much more convenient than the console, because you can copy and paste!
Edit /etc/apt/sources.list
and add the following line:
deb http://ftp.debian.org/debian jessie-backports main
Then run:
sudo apt update
Edit /etc/hostname
and put the fully-qualified hostname (FQDN) in there. Then change the
hostname to match:
hostname `cat /etc/hostname`
Edit /etc/hosts
and ensure that it contains the IP address and hostname of
your host. You will also need to choose a name (hostname) and IP address for
your cluster, which must be different. For example:
127.0.0.1 localhost
192.168.56.10 cluster.pcXX.sse.ws.afnog.org
192.168.56.11 ganeti1.pcXX.sse.ws.afnog.org
192.168.56.12 ganeti2.pcXX.sse.ws.afnog.org
You must use fully qualified hostnames for your nodes and for the cluster, for
example ganeti1.pcXX.sse.ws.afnog.org
, and they must all have IP addresses.
If you have your own delegated DNS domain, and you know how to add A records to
it, then you can use it instead!
Edit /etc/default/grub
and add/change the following lines to enable Xen (you
would not need this for a KVM cluster in production):
GRUB_DEFAULT=2
GRUB_CMDLINE_XEN_DEFAULT="dom0_mem=min:600M,max:600M"
This restricts the master domain to 600 MB RAM, which will make it slow, but give us more RAM free for guests. In your own configurations you should probably allocate more RAM to the host (domain 0)!
Then run the following commands:
sudo update-grub
sudo apt dist-upgrade
sudo apt install xen-linux-system-amd64
Then reboot
the host. Be sure to select a Xen kernel from the boot list. Log in again and check
that the free
command reports 600 MB of total Mem, not 2 GB:
afnog@ganeti:~$ free
total used free shared buffers cached
Mem: 527640
The Ganeti manual has instructions for this, but they are confusing and out-of-date for Debian >= Wheezy, so we skip that step and do it here instead:
sudo apt install drbd-utils
Edit /etc/modprobe.d/drbd.conf
and make it look like this:
options drbd minor_count=128 usermode_helper=/bin/true
Edit /etc/modules
and add the following line at the end:
drbd
Load the kernel module (driver) now:
sudo modprobe drbd
Edit /etc/apt/sources.list
and add the following line:
deb http://ftp.debian.org/debian jessie-backports main
Then run:
sudo apt update
Start by running the following commands:
sudo apt -t jessie-backports install ganeti ganeti-instance-debootstrap
Then start following the Ganeti installation tutorial, skipping the following sections:
You should only do this on the first node in the cluster. Do not do this step again unless you are setting up a new cluster!
Run the following command, ensuring that you use the correct cluster hostname:
sudo mkdir /root/.ssh
sudo gnt-cluster init --vg-name xenvg --enabled-hypervisors=xen-pvm -H xen-pvm:xen_cmd=xl cluster.pcXX.sse.ws.afnog.org
The gnt-cluster
command should take a few minutes to complete.
Note: Normally you would use either xen-hvm
or kvm
as the hypervisor,
instead of xen-pvm
above. In this case we must use xen-pvm
because we are
doing this inside a virtual machine, so we can’t use the virtualisation CPU
instructions because VirtualBox is already using them to run the Ganeti host
node (VirtualBox guest).
Test that you can run the following useful commands and examine their output:
sudo gnt-cluster verify
sudo gnt-node list
sudo gnt-instance list
sudo gnt-job list
You can ignore the following errors and warnings from gnt-cluster verify
. The error
is out of date, and the warnings are not problems because not all LVM volumes are
used by Ganeti.
Thu Jun 2 13:17:09 2016 - WARNING: node ganeti1.pc40.sse.ws.afnog.org: volume xenvg/Swap is unknown
Thu Jun 2 13:17:09 2016 - WARNING: node ganeti1.pc40.sse.ws.afnog.org: volume xenvg/Root is unknown
The Ganeti manual page gives useful information about Ganeti commands, including examples.
Create the file /etc/ganeti/vnc-cluster-password
containing the password that
you want to use for VNC access to consoles.
Check that the gnt-node list
command shows your node:
$ sudo gnt-node list
Node DTotal DFree MTotal MNode MFree Pinst Sinst
ganeti1.sse.ws.afnog.org 40.0G 28.8G 2.0G 1.9G 126M 0 0
Warning: If you see question marks in all the columns after the node name, like this:
$ sudo gnt-node list Node DTotal DFree MTotal MNode MFree Pinst Sinst ganeti1.sse.ws.afnog.org ? ? ? ? ? 0 0
that means that Ganeti cannot retrieve information about your node. Check the node daemon logfile
/var/log/ganeti/node-daemon.log
for possible error messages. For example, if you find this error:ERROR Can't retrieve xen hypervisor information (exited with exit code 1): ERROR: A different toolstack (xl) have been selected!
that means that Ganeti is trying to use the old
xm
command to get information, instead of the newxl
command, and not getting any information. You probably forgot to add the option-H xen-pvm:xen_cmd=xl
when you created the cluster. You can fix it by modifying the cluster settings on the node:sudo gnt-cluster modify -H xen-pvm:xen_cmd=xl
and check that the
gnt-node list
now shows the correct information for your node.
You should also make sure that the MFree
column shows at least 1 GB free (not
126 MB as in the example output above). This ensures that there is enough RAM
free in the hypervisor to create new guests. Otherwise you won’t be able to do
much with your new hypervisor. If it doesn’t show enough free RAM, check that
you have reconfigured GRUB and run
update-grub
.
Add an entry to /etc/hosts
for a host to use for burnin testing, for example burnin.example.com
:
192.168.56.20 burnin.example.com
The burnin
test will
fail unless we
generate a DH parameters file for SSL:
openssl dhparam -out dhparams.pem 2048
cat dhparams.pem | sudo tee -a /var/lib/ganeti/server.pem
Run the burnin
test to make sure that everything is working properly:
sudo /usr/lib/ganeti/tools/burnin -o debootstrap+default -t plain --disk-size 1024 --mem-size=512 burnin.example.com -vv
The output should end with:
- Attaching and removing disks
* instance burnin.example.com
attaching a disk with name RQLX441M
removing last disk
Thu Jun 2 19:13:17 2016 - WARNING: Hotplug is not supported: Hotplug is not supported by this hypervisor
Thu Jun 2 19:13:17 2016 - INFO: Modification will take place without hotplugging.
Thu Jun 2 19:13:18 2016 - INFO: Waiting for instance burnin.example.com to sync disks
Thu Jun 2 19:13:18 2016 - INFO: Instance burnin.example.com's disks are in sync
Thu Jun 2 19:13:19 2016 - WARNING: Hotplug is not supported: Hotplug is not supported by this hypervisor
Thu Jun 2 19:13:19 2016 - INFO: Modification will take place without hotplugging.
- Removing instances
* instance burnin.example.com
If this fails, you may need to remove the instance before trying again:
sudo gnt-instance remove burnin.example.com
Add a hostname (to /etc/hosts
or to a DNS domain that you control) with a
dummy IP address. We will use test.example.com
for this example.
192.168.56.21 test.example.com
Normally you would give it a static IP address, but we are on a DHCP network with limited IP addresses here.
Use the following command to create the test VM:
sudo gnt-instance add -t plain --disk 0:size=4G -B memory=512 \
-H xen-pvm:initrd_path=/boot/initrd-3-xenU -o debootstrap+default \
-n ganeti.pc40.sse.ws.afnog.org test.example.com
This new VM will have the following settings:
-H
parameter at all.ganeti.pc40.sse.ws.afnog.org
(you must specify
the host node or install an iallocator
script to choose one automatically).test.example.com
.The output should look like this:
Wed Jun 1 11:15:37 2016 * disk 0, size 4.0G
Wed Jun 1 11:15:37 2016 * creating instance disks...
Wed Jun 1 11:15:37 2016 adding instance test.example.com to cluster config
Wed Jun 1 11:15:37 2016 adding disks to cluster config
Wed Jun 1 11:15:38 2016 - INFO: Waiting for instance test.example.com to sync disks
Wed Jun 1 11:15:38 2016 - INFO: Instance test.example.com's disks are in sync
Wed Jun 1 11:15:38 2016 - INFO: Waiting for instance test.example.com to sync disks
Wed Jun 1 11:15:39 2016 - INFO: Instance test.example.com's disks are in sync
Wed Jun 1 11:15:41 2016 * running the instance OS create scripts...
If it doesn’t, look for an error message in the output or in /var/log/ganeti/jobs.log
.
Most likely one of the gnt-instance
parameters was missing or incorrect.
Check that your new instance appears in the output of gnt-instance list
, with status running
:
afnog@ganeti:/tmp$ sudo gnt-instance list
Instance Hypervisor OS Primary_node Status Memory
test.example.com xen-pvm debootstrap+default ganeti.pc40.sse.ws.afnog.org running 512M
Connect to its console:
sudo gnt-instance console test.example.com
You should see it boot up (if you’re not too late) or a login prompt. If it’s not running, you can try to start it again and connect to the console quickly to see what happened:
sudo gnt-instance start test.example.com
sudo gnt-instance console test.example.com
You can exit the console by pressing ^]
(Ctrl+]).
You should be able to login as root
with no password at the console. Add a normal user and
change the root password:
adduser afnog
passwd root
And edit /etc/network/interfaces
to configure networking, making it look like this:
auto lo
iface lo inet loopback
auto eth0
iface eth0 inet dhcp
Then tell it to reboot and watch it come back up. Check that you can access the internet from the guest. You should also be able to access the guest from your host computer, and hopefully from the local network too, using its IP address (assigned by DHCP).
We want to create a replicating Instance (virtual machine) with high availability, so we need a second Node to host the backup copy of this Instance.
Create a new virtual machine and configure it using the same instructions as above,
up to the point of creating a cluster, but don’t create a new cluster. Use a different
hostname, for example ganeti2.pcXX.sse.ws.afnog.org
, which must also be listed in
/etc/hosts
. The /etc/hosts
file should be exactly the same on both hosts.
Enable remote root logins by SSH on the new server, by editing /etc/ssh/sshd_config
and changing the PermitRootLogin
setting to yes
. Restart the ssh
service.
Now add this node to the existing cluster, by running the following command on a node already in the cluster (ganeti1):
sudo gnt-node add ganeti2.pcXX.sse.ws.afnog.org
This will prompt you to login as root to the new node (ganeti2) and will forcibly add it to the cluster. Do not do this to a node which is already in a cluster, as it will probably result in data loss!
Verify the cluster again:
sudo gnt-cluster verify
We can now use the same burnin
command with a different storage type, drbd
, which will
create an instance that replicates its data to another node:
sudo /usr/lib/ganeti/tools/burnin -o debootstrap+default -t drbd --disk-size 1G --mem-size=600 burnin.example.com -vv
This tool allows you to install many different distributions in a Ganeti instance, not just Debian 7. Unfortunately there are some problems with the packaging at the moment, so we need to download and install it manually:
sudo apt install curl mbr python-prctl python-scapy
sudo wget http://apt.dev.grnet.gr/jessie/snf-image_0.19-1~jessie_all.deb
sudo dpkg -i snf-image_0.19-1~jessie_all.deb
When asked for the URL to download snf-image-helper image from, if you are following this at an AfNOG event then you can use the copy that we downloaded for you, which will be much faster, by entering this URL:
http://sse-mini1.mtg.afnog.org
Choose a username and password for your remote account (jack
and mypassword
in this case) and
generate a hash using echo
and openssl md5
like this:
$ echo -n 'jack:Ganeti Remote API:mypassword' | openssl md5
(stdin)= 5ede44dba4dd4e9ce3909246515b2cdc
Insert them both into /var/lib/ganeti/rapi/user
, prefixing the password hash
with {ha1}
, and giving this user write
permissions:
jack {ha1}5ede44dba4dd4e9ce3909246515b2cdc write
Run the following commands:
sudo apt install git libssl-dev virtualenv
cd /tmp
wget http://sse-mini1.mtg.afnog.org/0.11.1.tar.gz
tar xzvf 0.11.1.tar.gz
cd ganeti_webmgr-0.11.1
We are using Ganeti Web Manager 0.11.1, which has a bug that we need to fix before we install:
cd ganeti_webmgr/ganeti_web/settings
wget https://raw.githubusercontent.com/qris/ganeti_webmgr/db1712973616765f62f85c16c246b53a73e8ac4e/ganeti_webmgr/ganeti_web/settings/base.py
cd .././..
Then we can run the installation script:
sudo ./scripts/setup.sh
Edit the file scripts/vncauthproxy/init-systemd and if the last line contains only a ~
character,
delete it. Also, in the [Service]
section, add the following line:
User=www.data
Then copy it to the systemd
service directory, and start it:
sudo cp scripts/vncauthproxy/init-systemd /lib/systemd/system/vncauthproxy.service
sudo mkdir /var/log/vncauthproxy /var/run/vncauthproxy
sudo chown www-data /var/log/vncauthproxy /var/run/vncauthproxy
sudo service vncauthproxy start
sudo cp ganeti_webmgr/ganeti_web/settings/config.yml.dist /opt/ganeti_webmgr/config/config.yml
Edit /opt/ganeti_webmgr/config/config.yml
and change the EMAIL_HOST
and
DEFAULT_FROM_EMAIL
lines, so that their values refer to your outbound server
and your email address.
Also find this line:
NAME: /opt/ganeti_webmgr/ganeti.db
And change it to:
NAME: /opt/ganeti_webmgr/db/ganeti.db
Then finish the installation:
cd /opt/ganeti_webmgr
sudo mkdir .settings db whoosh_index
sudo chown www-data .settings db whoosh_index
export DJANGO_SETTINGS_MODULE=ganeti_webmgr.ganeti_web.settings
sudo -E -u www-data bin/django-admin.py syncdb --migrate
sudo -E -u www-data bin/django-admin.py refreshcache
sudo -E -u www-data bin/django-admin.py rebuild_index
sudo -E -u www-data bin/django-admin.py collectstatic
Enter a username, password and email address for a super user for the Ganeti web manager.
Now start the web server in debugging mode:
cd /opt/ganeti_webmgr
sudo -E -u www-data bin/django-admin.py runserver 0.0.0.0:8000 --insecure
This will start the debugging webserver on port 8000, so you can check that everything is working by visiting http://192.168.56.10:8000. You should get a white page with a login and password box, but no styling (colours, images, etc.) If not, check the console output for error messages.
Create the file /opt/ganeti_webmgr/wsgi.py
with the following contents:
import os
import sys
path = '/opt/ganeti_webmgr'
# activate virtualenv
activate_this = '%s/venv/bin/activate_this.py' % path
execfile(activate_this, dict(__file__=activate_this))
# add project to path
if path not in sys.path:
sys.path.append(path)
# configure django environment
os.environ['DJANGO_SETTINGS_MODULE'] = 'ganeti_webmgr.ganeti_web.settings'
import django.core.handlers.wsgi
application = django.core.handlers.wsgi.WSGIHandler()
Create the file /etc/apache2/sites-enabled/ganeti.conf
with the following contents:
WSGIPythonHome /opt/ganeti_webmgr/venv
WSGISocketPrefix /var/run/wsgi
WSGIRestrictEmbedded On
<VirtualHost *:80>
ServerAdmin your-email-address@example.com
ServerName ganeti-server.local
ServerAlias 192.168.56.10
# Static content needed by Django
Alias /static "/opt/ganeti_webmgr/collected_static/"
<Location "/static">
Order allow,deny
Allow from all
SetHandler None
</Location>
# Django settings - AFTER the static media stuff
WSGIScriptAlias / /opt/ganeti_webmgr/wsgi.py
WSGIDaemonProcess ganeti processes=1 threads=10 display-name='%{GROUP}' deadlock-timeout=30
WSGIApplicationGroup %{GLOBAL}
WSGIProcessGroup ganeti
# Possible values include: debug, info, notice, warn, error, crit,
# alert, emerg.
LogLevel warn
<Location />
Require all granted
</Location>
</VirtualHost>
Now you should be able to access http://192.168.56.10/ (without the :8000 port specification) and see the login page with graphics:
Log in using the superuser account that you created during the syncdb
command, or if you have forgotten the details, run this command to create a new
one:
sudo -u www-data venv/bin/python manage.py createsuperuser
Choose Clusters from the menu on the left, and then click Add Cluster in the top right. Enter the following details:
Leave the other details blank, and click Add. Your new cluster should then appear with its specifications: