VCP 5 - Objective 7.1 – Monitor ESXi, vCenter Server and Virtual Machines

Describe how Tasks and Events are viewed in vCenter Server

Tasks

Task represent system activities that do not complete immediately. You are able to view task associated with a single object or all objects in the vSphere client inventory. These, along with tasks that are currently running are displayed in the Tasks And Events tab of the object. By default the tasks for a child object will also be displayed. If you are using linked mode, you will also see a connected group column which states which vCenter the task was performed on.

When viewing tasks you can set whether to show tasks for that object only or all child objects by using the Show all entries dropdown. You can also filter tasks by a certain search query. Filters can be applied to one ore many of the following; Name, Target, Status, Details, Initiated By, vCenter Server, Requested Start Time, Start Time or Completed Time.

Tasks can also be scheduled, the following are the tasks that are available to be scheduled in vCenter

Add a host
Change the power state of a VM
Change cluster power settings (DPM)
Change resource settings of a resource pool or VM (CPU and Memory Shares/Reservations/Limits)
Check the compliance of a host profile
Create/Clone/Deploy/Export/Import a VM
Migrate a VM (vMotion and Storage vMotion).
Snapshot a VM
Scan for updates and remediate an object.

There are a few rules as well on how vSphere manages tasks

The user performing the task must have the proper permissions to do so. If a scheduled task is created and the permissions are then removed for that user, the task will continue to run.
When operations required by manual and scheduled task conflict the activity due first is started first.
When a VM or host is in an incorrect state to perform the activity the task will not be performed.
When an object is removed from the vCenter server, all associated tasks are also removed.

Events

Events are actions that occur on an object. They include user actions and system actions. Each event records an event message. As with tasks you can view events on a single object or all objects in the inventory. When you are connected directly to a host, the Tasks and Events tab is only labeled Events. Again, as with tasks, the events shows events for the object selected as well as child objects. Events contain a filtering option as well with the following options; Description, Type, Date, Task, Target, and User.

Identify critical performance metrics

To me, the critical performance metrics are the same as the common metrics below as they relate to memory, CPU, network, and storage.

Explain common memory metrics

MCTLSZ – the amount of guest physical memory reclaimed by the balloon drivers. A large value means that a lot of this VMs physical memory is being reclaimed to decrease the hosts memory pressure.
SWCUR – This is the current swap usage. Basically the current amount of physical memory being swapped out to the backing store. This is vmkernel swapping, not guest OS swapping. Basically means that the VMs memory is not in physical memory, but on underlying disk which is much slower. This is not a big deal if the memory is not accessed that much, however if this value is high and SWR/s is over 0 then it is currently reading from the memory on disk.
SWTGT – This is the expected swap usage of the VM.
SWR/s – Number of reads from swapped memory. This is very bad, meaning that the VM is wanting to read memory back from the swapped disk into its physical memory. Very bad for performance.
SWW/s – Number of writes into swapped memory. This will happen if SWTGT is greater than SWCUR. This occurs when either the host is overcommited or the memory used by the VM reaches its memory limit of itself or its resource pool.
ZIP/s – If this is greater than 0 then the host is currently compressing memory. This could occur if it is overcommited.
UNZIP/s – This indicates if the host is currently access compressed memory. Usually indicates that the host was once overcommitted.

Explain common CPU metrics

%USED – Percentage of CPU time that is used by a VM. A high value in this metric indicates that the VM is using a lot of CPU resources.
%RDY – percentage of time that a world was ready to run, but had to wait on CPU. Normal causes are over provisioning of CPU's to VMs. This can also be caused if there is a CPU limit set on the VM (See MLMTD).
%CSTP – This is the percentage of time that a VM or world spent in a ready, co-scheduled state. This is only meaningful for VMs with multiple CPUs. This normally means that a VM is not using multiple vCPU's in a balanced fashion. Should either decrease CPU's or check to see if the VM is pinned to any CPUs.
%SYS – percentage of time spent by system services on behalf of the VM. A high value in this metric usually only indicates that the VM is very IO intensive.
%SWPWT – Amount of time that the VM was waiting on swapped pages to be read from disk. Usually indicates a memory problem.
%MLMTD – Amount of time that a VM was ready to run, but was deliberately not scheduled due to violations of the CPU Limit setting. This will also cause %RDY to increase.

Explain common network metrics

PKTTX/MBTX – Number of packets/megabits transmitted per second.
PKTRX/MBRX – Number of packets/megabits received per second.
DRPTX – Percentage of transmitted packets dropped. Usually means that the network transmit performance is bad. Could check whether physical NICs are currently using all of their capacity. May need more physical NICs or better load balancing policies implemented.
DRPRX – Percentage of received packets dropped. Usually means the network is highly over utilized. If on a VM you could try to increase the CPU resources of that VM.

Explain common storage metrics

GAVG – This is the round trip latency as it appears to the VM (see KAVG and DAVG, its normally the sum of both).
KAVG – Latency inside the vmkernel. High KAVG often causes or is caused by queuing (see QUED) This value should be small in comparison to DAVG, and should really be close to 0.
DAVG – Latency as seen at the device level (HBA). Basically the round trip time from the HBA to the storage array. DAVG is a good indicator of a performance issue of the backend storage.
QUED – Number of commands in the vmkernel that are currently queued. Could be an indicator that your queue depth is set too low. Follow your arrays instructions for queue depth settings.
ABRTS/s – Number of commands aborted per second. These are issued by the VM because commands are taking too long to complete. Resets issued by the VM could also be tagged as aborts as well. Normally caused by failed paths.

Compare and contrast Overview and Advanced Charts

I hate to say it again but real world experience with the charts is probably the best bet to master this section. Here's a brief description of both the types of charts.

Overview Charts

Shows multiple charts/metrics on one page. CPU Memory, Disk and Network.
Different metrics and charts are displayed depending on the inventory item selected.
Allow the ability to change the view (this will differ depending on the object selected) and the date range.
Different view options are as follows
- Datacenter – Clusters, Storage
- Cluster – Home, Resource Pools & VMs, Hosts
- Resource Pool – Home, Resource Pools & VMs
- Host – Home, VMs
- VM – Home, Storage
Time range can be changed to Day, Week, Month, or a custom value.
Works well for a broad overview of an inventory object.

Advanced Charts

Allow for more extensive analysis of an objects measurable.
Charts can be exported/saved as jpg, bmp, gif, png, or xls.
Can switch chart types from line and stacked graphs.
There are really too many metrics that you can display to list here.
Charts can be popped out into a separate window.
Most everything is customizable and configurable, date range, intervals, metrics being measured, etc…
Shows the latest minimum, maximum, average, and latest for the measured items.

Configure SNMP for vCenter Server

SNMP can provide information to a management program in a couple of different ways. Either in response to a get operation or by sending a trap. The SNMP configured in vCenter however only sends traps, it does not respond to get requests. vCenter will send an SNMP trap when the service starts or when an alarm is triggered. SNMP is configured in vCenter by navigating to Administration->vCenter Server Settings and filling out the following information in the SNMP section.

Receiver URL – DNS Name or IP of the receiver
Receiver Port – Port the receiver is listening on. If left blank it will use the default SNMP port of 162.
Community – the community identifier.
Optionally you can also enabled up to 3 additional receiver to a maximum of 4 total.

After this of course you would need to setup your SNMP receiver and load the VMware MIBS.

Configure Active Directory and SMTP settings for vCenter Server

Active Directory Settings

The following settings are available to define how vCenter interacts with Active Directory. Modified by navigating to Administration->vCenter Server Settings and selection the Active Directory section.

Active Directory Timeout – timeout interval (seconds) to use when connecting to AD.
Enable Query Limit – Limits the number of users and groups displayed in the Add Permissions box.
- Users & Groups – Enter the number of groups/users to be displayed. If you enter 0, all users and groups are displayed.
Enable Validation – vCenter will periodically check it's known users & groups against active directory.
- Validation Period – Number of minutes between synchronizations.

SMTP Settings

The following settings are available to define the mail settings in vCenter Server. These are accessed by navigating to Administration->vCenter Server Settings and selection the mail section.

SMTP Server – DNS name or IP address of the SMTP Server
Sender Account – email address of the sender account.

Configure vCenter Server logging options

The amount of detail that vCenter logs is also configurable The following settings are accessed by navigating to Administration->vCenter Server Settings in the Logging Options section.

None – Turns off logging
Error – Will only display error logging entries.
Warnings (Errors and Warning) – Displays only warnings and errors
Info (normal logging) – Displays information, error, and warning.
Verbose – displays information, error, warning and verbose entries.
Trivia – displays information, error, warning, verbose and trivia entries.

Create a log bundle

There are a couple of different ways to grab diagnostic information and generate log bundles from within the vSphere client. I'll explain both here..

First Way

Select the host, cluster, or datacenter in the inventory that you would like to generate the bundle for.
Select File->Export->Export System Logs
If you selected a cluster or datacenter, you can check or uncheck which hosts you would like to include here.
Select which components you would like to include in the bundle and whether or not to gather performance data.
Done.

Second Way

Click Administration->Export System Logs
Select the hosts you wish to export and/or vCenter.
Select whether to gather performance data.
DONE.

Create/Edit/Delete a Scheduled Task

Tasks can be scheduled within vCenter to run once or multiple times in the future, or at a recurring interval. I say from within vCenter because you must be connected to vCenter Server in order to create and managed scheduled tasks.

When creating scheduled tasks you are presented with the Scheduled Task wizard, however this wizard changes due to the fact that the tasks are available (listed below) are completely different in the types of information that you need to provide. Tasks can be created, edited, and deleted by navigating to Home->Management->Scheduled Tasks.

For the most part the settings when creating a scheduled task are the same as those you would do when carrying out the task in a normal fashion with the exception of the following.

Frequency/Start Time
- Once – Can either then select Now or Later and enter a date/time.
- After Startup – You then fill the delay setting in (minutes)
- Hourly – In Start Time enter the number of minutes after the hour to start the task. Then fill out the number of hours to run the task in the Interval . I.E. 30/5 will run the job at half past the hour every 5 hours.
- Daily – Enter start time and interval. I.E.. 1:00 am/2 will run the job at 1am every 2 days.
- Weekly – Again, Start time and interval need to be populated as well as which day(s) of the week to run.
- Monthly – Needs Start time as well as days of the month to run by either entering the specific days of the month (dates) or by selecting the week of the month (first, second, third, forth, or last) and then selecting the day of that week. Also, you can provide an interval to run every 1 month, every 2 months, etc.
Email notification can be setup.

The following are the tasks that are available to be scheduled in vCenter

Add a host
Change the power state of a VM
Change cluster power settings (DPM)
Change resource settings of a resource pool or VM (CPU and Memory Shares/Reservations/Limits)
Check the compliance of a host profile
Create/Clone/Deploy/Export/Import a VM
Migrate a VM (vMotion and Storage vMotion).
Snapshot a VM
Scan for updates and remediate an object.

There are a few rules as well on how vSphere manages tasks

The user performing the task must have the proper permissions to do so. If a scheduled task is created and the permissions are then removed for that user, the task will continue to run.
When operations required by manual and scheduled task conflict the activity due first is started first.
When a VM or host is in an incorrect state to perform the activity the task will not be performed.
When an object is removed from the vCenter server, all associated tasks are also removed.

Configure/View/Print/Export resource maps

vCenter resource maps are a great way to provide a visual representation of your vCenter Inventory. Maps are only available when connected directly to a vCenter Server and contain the following views.

VM Resources
Host Resources
Datastore Resources
vMotion Resources

VM Resources allow you to map the VM to Networks and Datastores. When selected on the cluster you can also view fault tolerance relationships.

Host Resources allow you to see the relations ships between the VMs, Networks, Datastore to that host.

The Datastore Resources allow you to apply the same Host and VM resources but map them back to a datastore.

The vMotion maps are one of my favorite maps as they have a little bit of intelligence to them. This map view is available when a VM is selected in the inventory and displays hosts that are compatible (green circle) and incompatible (red x) for migration targets. It also shows the current CPU load of the host as well.

All the maps can be printed as well as exported (accessed from the File menu). Maps can be exported as jpg, bmp, png, gif, tiff, and emf formats.

Start/Stop/Verify vCenter Server service status

The vCenter service is stopped, started, and restarted the same way any other windows service is.

Start/Stop/Verify ESXi host agent status

There are a couple of ways to start and stop the ESXi host agents. You can use the 'Restart Management Agents' setting in the DCUI or by using the 'services.sh restart' command on the CLI.

Configure vCenter Server timeout settings

The vCenter Server Timeout intervals control how long before a command times out the vSphere Client. To get at these settings navigate to the Timeout Settings section of Administration->vCenter Server Settings. From here you can configure the Normal and Long operation timeout settings in seconds. vCenter Server must be restarted for this to take effect.

Monitor/Administer vCenter Server connections

You can view a list of active sessions in the vSphere client only when connected to a vCenter Server (not directly to a host). By navigating to Administration->Sessions you can see a list of all the connections to the server. You should see Username, Full Name, Online Time, and Status. To terminate any of these sessions simply right-click it and select 'Terminate Session'. You can also set up a Message of the day on this screen to send a message to all users.

Create an Advanced Chart

When in the advanced view of the Performance tab you can access the Chart Options link on the top of the screen. In here you are able to filter your metrics that you want select by choosing on of the following; Cluster Services, CPU, Datastore, Disk, Memory, Network, Power, Storage Adapter, Storage Path, System and vSphere Replication. You can then select your desired objects and counters as well as a chart type (Line Graph, Stacked Graph, Stacked Graph (Per VM). You can also Save and Load already saved chart settings here.

Determine host performance using resxtop and guest Perfmon

Again these are going to be best learned by using both of the options. I've explained a little bit below.

Perfmon

VMware specific metrics can be displayed through perfmon in a Windows virtual machine. All of the virtual machine performance objects begin with VM. One note is that you cannot view these when simply running perfmon on a 64 bit OS by running perfmon. You must run the 32 bit version of perfmon located at c:\windows\system32\perfmon.exe.

resxtop

I've somewhat displayed most of the counters to watch in the CPU/Memory/Disk/Network metric sections above. resxtop is ran remotely using the vSphere CLI. One note is that you can run it in batch mode in order to capture data on its own by using resxtop -b >> myfile.csv.

Given performance data, identify the affected vSphere resource

Most of this again is covered in the common metrics sections above. Use all of this information and any other info that you can gather to drill down to the problem as quick as possible. Also, read the Troubleshooting Guide.