Although High Availability is something I’ve been configuring for many years now I thought it might be a good idea to go over the whole process again. This became especially evident after watching the HA section of Jason Nash’s TrainSignal/PluralSight course, as I quickly realized there are a lot of HA advanced settings that I’ve never modified or tested – with that said, here’s the HA post.
First off I’m not going to go over the basic configuration of HA – honestly, it’s a checkbox right – I think we can all handle that. I will give a brief description of a few of HA bullet points that are listed within the blueprint and point everyone where we can manage them.
First up, Admission Control
When an HA event occurs in our cluster, we need to ensure that enough resources are available to successfully failover our infrastructure – Admission control dictates just how many resources we will set aside for this event. If our admission control policies are violated, no more VMs can be powered on inside of our cluster – yikes! There are three types…
Specify Failover Host – Ugly! Basically you assign a host as the host that will be used in the event of an HA event. The result of an HA event is the only time that this host will have VMs running on it – all other times, it sits there wasting money 🙂
Host failures cluster tolerates – This is perhaps the most complicated policy. Essentially a slot size is calculated for CPU and memory, the cluster then does some calculations in order to determine how many slot sizes are available. It then reserves a certain number of failover slots in your cluster to ensure that a certain number of hosts are able to failover. There will be much more on slot size later on in this post so don’t worry if that doesn’t make too much sense.
Percentage of Cluster resources reserved – This is probably the one I use most often. Allows you to reserve a certain percentage of both CPU and Memory for VM restarts.
So, back to slot size – a slot is made up of two components; memory and cpu. HA will take the largest reservation of any powered on VM in your environment and use that as its memory slot size. So even if you have 200 VMs that have only 2GB of RAM, if you place a reservation on just one VM of say, oh, 8GB of RAM, your memory slot size will be 8GB. If you do not have any reservations set, the slot size is deemed to be 0MB + memory overhead.
As for CPU, the same rules apply – the slot size is the largest reservation set on a powered on VM. If no reservations are used, the slot size is deemed to be 32MGHz. Both the CPU and Memory slot sizes can be controlled by a couple of HA advanced settings – das.slotCpuInMhz and das.slotMemInMb (**Note – all HA advanced setting start with das. – so if you are doing the test and you can’t remember one, simply open the Availability doc and search for das – you’ll find them ). These do not change the default slot size values, but more so specify an upper limit in wich a slot size can be.
So let’s have a look at these settings and slot size – first up, we can see our current slot size by selecting the ‘Advanced Runtime Info’ link off of a clusters’ Summary tab. As shown below my current slot size for CPU is 500Mhz and 32MB for memory, also I have 16 total slots, 4 of which have been taken.
So let’s now set the advanced setting das.slotCpuInMhz setting to something lower than 500 – say we only ever want our CPU slot size for a VM to be 64Mhz. Within the clusters’ HA settings (Right-click cluster->Edit Settings, vSphere HA) you will see an Advanced Options button, select that and set our das.slotCpuInMhz to 64 as shown below.
Now we have essentially stated that HA should use the smallest of either the largest VM CPU reservation, or the value for das.slotCpuInMhz as our CPU slot size. A quick check on our runtime settings again reflects the change we just made. Also, if you look, you will see that we have also increased our total available slots to 128, since we are now using a CPU slot size of 64 Mhz rather than 500.
So that’s admission control and slot sizes in a nutshell. Seems like a good task to have you limit or change some slot sizes on the exam. Also, I’m not sure how much troubleshooting needs to be performed on the exam but if presented with any VMs failing to power on scenarios, slot sizes and admission control could definitely be the answer.
More Advanced Settings
As you may have seen in the earlier screenshots there were a few other of those das. advanced settings shown. Here’s a few that you may need to know for the exam, maybe, maybe not, either way, good to know…
das.heartbeatDsPerHost – used to increase the number of heartbeat datastores used – default is 2, however can be overridden to a maximum of 5. Requires complete reconfiguration of HA on the hosts.
das.vmMemoryMinMb – value to use for the memory slot size if no reservation is present – default of 0
das.slotMemInMb – upper value of a memory slot size – meaning we can limit how large the slot size can be by using this value.
das.vmCpuMinMhz – value to use for the cpu slot size if no reservations are present – default of 32.
das.slotCpuInMhz – upper value of a CPU slot size – meaning we can limit how large the slot size can be by using this value
das.isolationAddress – can be used to change the IP address that HA pings when determining isolation – by default this is the default gateway.
das.isolationAddressX – can be used to add additional IPs to ping – X can be any number between 0 and 9.
das.useDefaultIsolationAddress – can be used to specify whether HA should even attempt to use the isolation address.
Anyways, those are the most commonly used settings – again, any others will be listed in the availability guide so use that if needed to find others on the exam – but remember, having to open those pdf’s will take away valuable time.
Other random things
Just a few notes on some other parts of HA that I haven’t used that often. The first being VM Monitoring. VM Monitoring is a process that will monitor for heartbeats and I/O activity from the VMware tools service inside your virtual machines. If it doesn’t detect activity from the VM, it determines that it has failed and can proceed with a reboot of that VM. vSphere has a few options as it pertains to VM monitoring that we can use to help prevent false positives and un needed VM reboots.
Failure Interval – Amount of time in seconds to check for heartbeats and I/O activity.
Minimum Uptime – The amount of time in seconds that VM monitoring will wait after a power on or restart before it starts to poll.
Maximum Per VM Resets – the number of times that a VM can be reset in a given time period (Reset Time Window)
Reset Time Window – used fo the maximum VM resets – specified in hours
The blueprint also mentions heartbeat datastore dependencies and preferences. Quickly, vSphere will chose which datastores to use as HA heartbeat datastores automatically, depending on a number of things like storage transport, number of hosts connected, etc. We can change this as well in our options. We can instruct vSphere to only chose from our preferred list (and by which only selecting 2 datastores will in turn allows us to determine which datastores are used) or we can say to use our preferred if possible, but if you can’t, go ahead and chose the ones you want.
As well, most all of the settings we set for defaults such as isolation response and restart priority can be set on a per-VM basis as well. This is pretty easy so I won’t explain it but just wanted to mention that it can be done.
I’d say that’s enough for HA – it’s not a hard item to administer. That said, lab it, lab all of it! Practice Practice Practice.
Hi Matt there is also das.isolationShutdownTimeout, this is the amount of time HA will wait to gracefully shutdown the vm before performing a hard power off, default is 300 seconds ie 5 minutes, you can increase/decrease the timeout if required, setting must be in seconds
das.useIsolationAddress should be das.useDefaultIsolationAddress. It’s a boolean value that determines whether or not the default isolation address (ie the default gateway) is pinged before the other user-defined isolation addresses.
Thanks Roy – I’ve updated the post
Hi Matt,
there is a typo in das.vmMemoryInMb and das.vmCpuInMhz as they should be written as das.vmMemoryMinMb and das.vmCpuMinMhz as they define the minimum of the slot size.
Otherwise excellent scenarios, thank you for your efforts! I’m taking the exam soon, wish me luck 🙂
Thanks so much – corrected in the post.