Identify VMware Fault Tolerance requirements

What is FT?
 
While VMware HA allows for the restart of VMs in the event of a host failure there is still a small amount of downtime while the VM is being restarted.  The answer to this down time is VMware FT.  FT provides a higher level of protection by making VMs continuously available in the event of a HOST FAILURE (I say host failure because FT will not protect if the OS blue screens or an application fails on the primary VM, the secondary VM will do the same).  FT keeps the states of a primary and secondary VM identical by using VMware vLockstep technology.  The vLockstep technology replays all instructions from the primary VM on the secondary.  If the host running the primary VM fails, the secondary becomes the new primary, and a new secondary is created.  This will occur even if vCenter is not available.
 
FT Requirements
 
This is a list of requirements that I could find within VMware documentation.  They also have an application called the VMware SiteSurvey utility which will scan and help you discover and better understand configuration issues with FT and your environment.
 
Cluster Requirements
  • Host Certificate checking must be enabled
  • At least 2 FT-certified hosts running the same FT version or host build number.
  • Hosts need access to the same storage
  • FT Logging and VMotion Networking need to be configured.
  • HA must be enabled on the cluster.  If it isn't you will not be able to power on an FT machine or add a host running an FT machine already to the cluster.

Host Requirements

  • Must contain processors from the FT-compatible processor group.  Highly recommended that CPUs are also compatible with one another.
  • Must be licensed for FT (Enterprise or Enterprise Plus)
  • Must be certified for FT (HCL).
  • BIOS must have Hardware Virtualization (HV) enabled.

VM Requirements

  • Virtual disks must either be in virtual RDM mode or VMDK files (no physical RDM). The disk must also be in thick format.
  • VM files must be stored on shared storage (FC, FCOE, iSCSI, NFS, NAS).
  • Cannot have more than one cpu.
  • Must be running on Windows 7, Windows Server 2008, Vista, 2003, XP, 2000, NT 4, All Linux supported by ESX, Netware, solaris 10, and FreeBSD ( there are some limitations on processors though, so check them out).

The following is not supported with FT

  • Snapshots
  • Storage vMotion
  • Linked Clones
  • Cannot backup an FT machine using the Storage API for Data Protection, VMware Data Recovery.  Array based snapshots however do not affect it.
  • Cannot use a floppy or cdrom backed by physical or remote device (only shared storage img and iso images).
  • USB and sound devices
  • NPIV
  • NIC passthrough
  • vlance networking drivers
  • No Hot plugable features (includes changing attached networks).
  • EPT/RVI
  • Serial or parallel ports
  • IPv6
  • 3D enabled video drivers.
Configure VMware Fault Tolerance networking
 
Prerequisites
  • Multiple Gigabit NICs.  Each host will need at least two, one for FT Logging and one for vMotion.

Configuring the networking is quite easy, essentially create two vmkernel ports, one for vMotion and one for FT Logging.  *** NOTE *** The FT traffic is not encrypted, so secure this network as best you can, probably best to have a private network.

After you have created the vmkernel port for FT logging your hosts summary tab should show 'Configured for FT'.  If there is an issue, the little blue comment box will display what it is as your hover over it.

Enable/Disable VMware Fault Tolerance on a virtual machine
 
Enable Fault Tolerance
 
This is actually quite easy.  Right click a VM and select 'Fault Tolerance' -> 'Enable Fault Tolerance'
 
This option may be dimmed if
  • The VM is registered on a host that isn't licensed for FT
  • The VM is on a host that is in maintenance or standby
  • The VM is disconnected or orphaned
  • The user doesn't have the permission to do this.

After selecting Enable Fault Tolerance the following validation checks are performed

  • SSL certification checking is enabled
  • The host is in a vSphere HA cluster or mixed HA and DRS cluster
  • host has ESX(i) 4.0 or greater installed
  • VM doesn't have multiple CPUs, snapshots, ha disabled or a 3d video device.
  • Checks the BIOS for HV
  • Checks processors for primary and secondary
  • Checks processors in conjunction with the OS

The following occurs when enabling FT

  • A secondary VM is created.  The placement and status of this VM will vary depending on the power state of the primary VM
    • If Primary is Powered ON
      • Entire state of primary VM is copied and the secondary is created, placed on a separate host and powered on (if it passes admission control).
      • FT status on the VMs summary tab will be 'Protected'
    • If Primary is powered off
      • Secondary is immediately created and registered to a host in the cluster ( could even be same host as primary but will be moved on power on ).
      • Secondary VM will not be powered on until the primary is powered on.
      • FT status will display 'Not Protected, VM not Running'
  • Once Fault tolerance is enabled, vCenter will remove the VMs memory limits and reservations and set a new memory reservation equal to the memory size of the VM.  While FT is enabled on this VM you cannot change memory reservations, limits, size, or shares.  If you disable FT, these values are not reverted back.

Once enabled, the FT section in the summary tab will show you the following

  • FT Status
    • Protected – Primary and secondary are powered on and running as expected
    • Not Protected – Secondary VM is not running.  It will also provide a reason
      • Starting – FT is in the process of starting the secondary.
      • Need Secondary VM – Primary VM is running without a secondary.  Normally caused by the inability to create a secondary due to incompatible hosts.  If there are compatible hosts, sometimes disabling ft and re-enabling will fix this.
      • Disabled – FT is currently disabled ( occurs when FT is disabled by the user or vCenter Server may disable FT after being unable to power on the secondary).
      • VM Not Running – Ft is enabled, but primary is powered off.
  • Secondary Location – shows which host is running the secondary VM
  • Total Secondary CPU – shows the CPU usage of the secondary VM (MHz)
  • Total Secondary Memory – shows the total memory usage of the secondary (MB)
  • vLockstep Interval – The time interval in seconds needed for the secondary VM to match the current execution state of the Primary.  Typically less than 1/2 a second.  No state will be lost even if this interval is high.
  • Log Bandwidth – Amount of network capacity used to send FT log info from the host running the primary to the host running the secondary.

To disable just right click and chose 'Fault Tolerance' -> 'Turn off fault tolerance'

Test an FT configuration

VMware provides a couple of FT scenario's that can be tested
 
Testing FT Failover
  • The secondary machine will become the new primary, the old primary is then removed.
  • A new secondary machine will spawn up and sync up with the new primary.

Testing Restart Secondary

  • This will destroy the current secondary VM and restart another one.
  • The primary is unaffected during this test.
Determine use case for enabling VMware Fault Tolerance on a virtual machine
 
There are a number of use cases for Fault Tolerance.  Its best to keep in mind that Fault Tolerance however does not protect against an OS failure, or an application failure, it simply protects against a host failure.  Some use cases for FT might include
  • Applications that need to be highly available (especially those with long lasting client connections) that you want to survive a hardware failure.
  • Custom built applications that have no other form of clustering available.
  • Its a simple way to provide HA to an application and doesn't require difficult and complex setups like other clustering solutions.
  • If you want to protect a key VM during a critical time to ensure there would be no downtime if a host fails.