List vCenter default utilization alarms
- Datastore cluster is out of space – Warning at 75%, Alert at 85%
- Datastore usage on disk – Warning at 75%, Alert at 85%
- Host CPU Usage – Warning at 75%, Alert at 90%
- Host Memory Usage – Warning at 75%, Alert at 90%
- VM CPU Usage – Warning at 75% (for 5 min), alert at 90% (for 5 min).
- VM Memory Usage – Warning at 75% (for 5 min), alert at 90% (for 5 min).
- VM Max Total Disk Latency – Warning at 50 (for 5 min), alert at 75 (for 5 min).
- Host Service Console Swapping Rates – Warning above 512 (for 1 min), alert above 2048 (for 1 minute) Swap In AND out.
- Cannot connect to storage – monitors host connectivity to storage device
- Cannot find vSphere HA master agent – alarms if vCenter cannot connect to HA master
- Host Connection and Power State – Triggered if host connection state is not responding
- Host Connection failure – Triggers if ccagent, network or time-out errors.
- Network Connectivity Lost – Triggered if you lose network connectivity.
- Network Uplink Redundancy Degraded – Triggers if redundancy is degraded. (if you lose and uplink)
- Network Uplink Redundancy Lost – Triggered if you lose all redundancy in your uplinks.
List possible actions for utilization and connectivity alarms
- Enter Maintenance Mode
- Exit Maintenance Mode
- Enter Standby
- Exit Standby
- Reboot
- Shutdown
When creating an alarm on a virtual machine/Resource Pool object you can
- Power On a VM
- Power Off a VM
- Suspend a VM
- Reset a VM
- Migrate VM
- Reboot Guest OS
- Shutdown Guest OS.
- Name and Description
- Alarm Type – Defines the type of object to be monitored
- Triggers – Defines the actual event, condition or state change that will trigger the alarm as well as the notification severity.
- Tolerance thresholds – Can provide additional restrictions on condition and state trigger thresholds that must be exceeded before the alarm is triggered.
- Actions – The operations to perform in response to a triggered alarm. (explained above).
Alarms have only three severity levels (Normal, Warning, Alert) displayed in a Green, Yellow, Red fashion. An alarm will trigger on a change of one of these levels which are sequential in nature, meaning the only time an alarm can trigger is during a Green to yellow, yellow to red, red to yellow, or yellow to green. It is impossible to have a red to green or green to red. Alarms are also inherited by child objects, meaning if you set an alarm to monitor VM Memory Usage on a cluster, all VMs within that cluster will be monitored. Alarms can only be modified, disabled, or enabled on the object to which they were defined. In the above example, to disable that alarm you would need to do so on the cluster level, as that is where it was created. You would not be able to modify the alarm with a VM selected.
- Alarm Name and Description, and whether the alarm is enabled or disabled.
- Set your Alarm Type – Here you specify exactly what it is you want to monitor could be…
- VMs
- Hosts
- Clusters
- Datacenter
- Datastores
- Virtual Distributed Switches
- Distributed Port Groups
- Datastore Clusters
- Also what you are going to monitor for, options are
- Monitor for specific conditions or state, CPU Usage, Memory Usage, Power State
- Monitor for specific events occurring – VM Powered On, VM Powered Off, etc.
Triggers
This tab will change depending on the type of monitoring you have chosen on the general tab. I'll do my best to explain both here.
- Monitoring for specific conditions or state – The following will need to be specified here.
- Trigger Type – This will determine your condition selection as well. Basically this what you would like to monitor (CPU Usage or VM State, etc).
- Condition – This is the condition operator that must be met in order to trigger the alarm. So if you have chosen CPU Usage or a utilization type of trigger, you will be present with a 'Is Above' and a 'Is Below', however if you have chosen a state monitoring type trigger such as VM State you will be presented with an 'Is Equal To' or an 'Is Not Equal To' condition.
- Warning/Alert thresholds and condition lengths. – This is the actual metric value which will trigger the alarm. For example you would enter 75 and 80 in here if you wanted to trigger CPU usage warnings at 75% and Alerts at 80%. In the same example if it was a state alarm you would be presented with a dropdown containing the possible values for that trigger type.
- Monitoring for specific events
- Event – This is the event to watch for such as Cannot Deploy VM or Cannot Synchronize Host.
- Status – whether to throw a Normal, Warning, or Alert
- Conditions – These are all the event arguments the event actually looks for.
- Range – repeats the triggered alarm when conditions exceeds a certain percentage above or below limit.
- Frequency – Repeats the alarm every so many minutes.
Actions Tab
This tab is used to configure the specific actions to take when the alarm is triggered. The actions that can be taken were specified above. Just a note, you can set up an alarm to take multiple actions on one alarm, as well as specify if they are repeated and on which changes they are repeated (warning to alert, alert to warning, warning to normal, etc) as well as the frequency the actions should be repeated (in minutes).