8 weeks of #VCAP – Fault Tolerance by @tomverhaeg
You might know VMware Fault Tolerance already, since the VCAP exam builds on the VCP knowledge. But still, it is in the blueprint, so it might be wise to go over it.
Fault Tolerance, often abbreviated as FT is a technique in which a shadow VM of a running VM is kept in lockstep with the primary. This basically means that all memory and CPU calculations on the primary VM also will be executed on the secondary VM.
In case of a host failover, a VM with fault tolerance enabled can switch over from the primary to the second VM in a matter of seconds, taking right over where the primary stopped. This allows for a better uptime of that VM and avoids the VM restart that HA would do.
There are a few host requirements for running FT:
-> You need to have a cluster where HA is enabled
-> All hosts needs to access the same (shared) datastores
-> There needs to be physical processor support
-> VMkernel ports need to be configured for vMotion and FT logging
There are also some VM requirements for running FT:
-> The VM can only have one (1) vCPU, so no vSMP
-> The VM disks need to be eager zeroed thick provisioned
-> No non re-playable devices (CD ROM, USB devices etc).
-> No snapshots
Configuring the VMkernel port for FT logging
Conform VMware best practices for FT, it it wise to use a dedicated NIC for FT logging (preferably even 10 gigabit), but configuring FT logging is as easy as selecting a checkbox on a VMkernel port:
Enabling FT on a VM
Enabling FT is rather simple, right-click the VM -> Fault Tolerance -> Turn on Fault Tolerance. You might get a popup saying that a reservation (memory) will be created for the full memory allocation of this VM, and that the disk will be eager zeroed out.
After it walks through the process of enabling fault tolerance, you get a nice blue icon in your inventory:
After powering on the FT VM, on the summary page, you also see some info about the FT status:
Testing VMware FT
Now that we have a running FT VM, we might as well test it. We have 2 options for testing it:
Test failover – The primary VM does a failover to the primary VM, and then spawns up a new secondary VM.
Test restart secondary – The secondary VM is re-spawned and the FT configuration is protected again.
After doing a failover of the primary VM, a new secondary VM will be spawned, so the status after doing the failover might be like this:
Troubleshooting VMware FT
So, all is happy, but since we’re doing the VCAP exam, we might expect some troubleshooting.
On the summary page of the host, you can see if the host is configured and ready for FT. If it isn’t, the reason why will also be mentioned:
In the image above, there isn’t a VMkernel port configured for FT logging. So go into your networking and check that FT logging box.
Also, when the VM mentions something like this, the secondary VM is not running, so do a restart or migrate secondary: