List vCenter default utilization alarms

  • Datastore cluster is out of space – Warning at 75%, Alert at 85%
  • Datastore usage on disk – Warning at 75%, Alert at 85%
  • Host CPU Usage – Warning at 75%, Alert at 90%
  • Host Memory Usage – Warning at 75%, Alert at 90%
  • VM CPU Usage – Warning at 75% (for 5 min), alert at 90% (for 5 min).
  • VM Memory Usage – Warning at 75% (for 5 min), alert at 90% (for 5 min).
  • VM Max Total Disk Latency – Warning at 50 (for 5 min), alert at 75 (for 5 min).
  • Host Service Console Swapping Rates – Warning above 512 (for 1 min), alert above 2048 (for 1 minute)  Swap In AND out.
List vCenter default connectivity alarms
  • Cannot connect to storage – monitors host connectivity to storage device
  • Cannot find vSphere HA master agent – alarms if vCenter cannot connect to HA master
  • Host Connection and Power State – Triggered if host connection state is not responding 
  • Host Connection failure – Triggers if ccagent, network or time-out errors.
  • Network Connectivity Lost – Triggered if you lose network connectivity.
  • Network Uplink Redundancy Degraded – Triggers if redundancy is degraded. (if you lose and uplink)
  • Network Uplink Redundancy Lost –  Triggered if you lose all redundancy in your uplinks.

List possible actions for utilization and connectivity alarms

All alarms allow you to send a notification email, send a notification trap, or run a custom command.  There are a few other options however when creating the alarms on certain objects.
 
When creating an alarm on the host in addition to the actions above you can.
  • Enter Maintenance Mode
  • Exit Maintenance Mode
  • Enter Standby
  • Exit Standby
  • Reboot
  • Shutdown

When creating an alarm on a virtual machine/Resource Pool object you can

  • Power On a VM
  • Power Off a VM
  • Suspend a VM
  • Reset a VM
  • Migrate VM
  • Reboot Guest OS
  • Shutdown Guest OS.
Create a vCenter utilization alarm/Create a vCenter connectivity alarm/Configure alarm triggers/Configure alarm actions
 
So, I thought since all of these options are really defined when creating an alarm I would just bundle them all up and explain a little bit about all of them here.
 
So, alarms are essentially a notification or action taken in response to an event, a set of conditions or the state of an inventory object.  An alarm consists of the following elements
  • Name and Description
  • Alarm Type – Defines the type of object to be monitored
  • Triggers – Defines the actual event, condition or state change that will trigger the alarm as well as the notification severity.
  • Tolerance thresholds – Can provide additional restrictions on condition and state trigger thresholds that must be exceeded before the alarm is triggered.
  • Actions – The operations to perform in response to a triggered alarm. (explained above).

Alarms have only three severity levels (Normal, Warning, Alert) displayed in a Green, Yellow, Red fashion.  An alarm will trigger on a change of  one of these levels which are sequential in nature, meaning the only time an alarm can trigger is during a Green to yellow, yellow to red, red to yellow, or yellow to green.  It is impossible to have a red to green or green to red.  Alarms are also inherited by child objects, meaning if you set an alarm to monitor VM Memory Usage on a cluster, all VMs within that cluster will be monitored.  Alarms can only be modified, disabled, or enabled on the object to which they were defined.  In the above example, to disable that alarm you would need to do so on the cluster level, as that is where it was created.  You would not be able to modify the alarm with a VM selected.

The process of creating an alarm can be done either from the Alarms tab of the desired object, or by right clicking an item in the Inventory and selecting 'Alarm->Add Alarm'.  The Alarm settings dialog box will appear with the following tabs explained below
 
General Tab
  • Alarm Name and Description, and whether the alarm is enabled or disabled.
  • Set your Alarm Type – Here you specify exactly what it is you want to monitor could be…
    • VMs
    • Hosts
    • Clusters
    • Datacenter
    • Datastores
    • Virtual Distributed Switches
    • Distributed Port Groups
    • Datastore Clusters
  • Also what you are going to monitor for, options are
    • Monitor for specific conditions or state, CPU Usage, Memory Usage, Power State
    • Monitor for specific events occurring – VM Powered On, VM Powered Off, etc.

Triggers

This tab will change depending on the type of monitoring you have chosen on the general tab.  I'll do my best to explain both here.

  • Monitoring for specific conditions or state  – The following will need to be specified here.
    • Trigger Type – This will determine your condition selection as well.  Basically this what you would like to monitor (CPU Usage or VM State, etc).
    • Condition – This is the condition operator that must be met in order to trigger the alarm.  So if you have chosen CPU Usage or a utilization type of trigger, you will be present with a 'Is Above' and a 'Is Below', however if you have chosen a state monitoring type trigger such as VM State you will be presented with an 'Is Equal To' or an 'Is Not Equal To' condition.
    • Warning/Alert thresholds and condition lengths.  – This is the actual metric value which will trigger the alarm.  For example you would enter 75 and 80 in here if you wanted to trigger CPU usage warnings at 75% and Alerts at 80%.  In the same example if it was a state alarm you would be presented with a dropdown containing the possible values for that trigger type.
  • Monitoring for specific events
    • Event – This is the event to watch for such as Cannot Deploy VM or Cannot Synchronize Host.
    • Status – whether to throw a Normal, Warning, or Alert
    • Conditions – These are all the event arguments the event actually looks for.
Also in this tab you can add multiple triggers and specify whether to trigger the alarm if any or all of the conditions are met.
 
Reporting Tab
 
This tab allows you to set the following options, only available when monitoring for utilization or state.
  • Range – repeats the triggered alarm when conditions exceeds a certain percentage above or below limit.
  • Frequency – Repeats the alarm every so many minutes.

Actions Tab

This tab is used to configure the specific actions to take when the alarm is triggered.  The actions that can be taken were specified above.  Just a note, you can set up an alarm to take multiple actions on one alarm, as well as specify if they are repeated and on which changes they are repeated (warning to alert, alert to warning, warning to normal, etc) as well as the frequency the actions should be repeated (in minutes).

For a given alarm, identify the affected resource in a vSphere implementation
 
Triggered alarms are easy to figure out what is affected as the object will actually show either the alert or warning symbol right on it.  As well, you can use the Triggered Alarms section on the Alarms tab and determine the affected resource under the object tab.  For non-triggered alarms you will need to either look at the defined in column, or go directly into the alarm and look at the Alarm Type section of the general tab.  You should also make it a best practice to give alarms a good description and title to make this determination easier.