VCP 5 – Objective 6.3 – Perform Basic vSphere Storage Troubleshooting

Verify storage configuration

Again I would recommend using the Maps tab in vSphere to verify storage configuration.  From here you can see host to datastore as well as VM to datastore.  Another tab that you can look at is the Storage Views tab.  This tab will show you many different configurations as related to storage.  In order to view this tab you must have the vCenter Storage Monitoring plug-in, which is usually installed and enabled by default.  More about these storage reports will be explained in the last point of this section.
 
The other spot where you can view your storage configuration is in the storage/storage adapters section of the configuration tab of a host.  From the storage section you can see a list of your datastores in either datastore or device views.  The chart is pretty simple and shows you the datastore name, status, device, drive type, capacity, free space, type, last update, alarm actions, storage io control status, and hardware acceleration.  From the storage adapters type you can see all of your storage adapters.as well as the associated datastores and paths related to them.
 
Troubleshoot storage contention issues
 
Storage contention occurs when the demand of the hosts and VMs exceeds that of the storage array and/or hba's.  There are certainly ways to help prevent storage contention such as Storage DRS which has been talked about throughout this guide.  Also, there is what is called Storage I/O control which has also been mentioned and explained throughout this guide.  Certainly the first step in troubleshooting storage contention is to find out where the bottleneck, or slow down is occurring.  As it relates to vSphere the contention could be occurring at the VM, HBA, or array level.  The easiest and most efficient way to figure this out is through esxtop and the following metrics.
  • davg – this is the average response time for a command being sent to the device.
  • kavg – this is the average response time a command is in the vmkernel
  • gavg – this is the response time as it appears to the VM.  Usually davg + kavg.
  • CMD/s – number of IOps being sent to or received from the device or the VM.

If you experience high latency times, (davg/kavg) then its probably best to investigate issues with your array and/or switches and paths to the array.  VMware makes the following recommendations to solve storage contention issues.

  • Check the CPU usage of the VMs and increase queue depth (advanced setting) if needed.
  • Storage vMotion the VM or VMs to a new LUN with more spindles or add more disks to the LUN in question.
  • Increase the VMs memory – this will allow for more OS caching which may reduce I/O activity.
  • defragment file systems
  • Turn off any anti-virus on-demand scans.

Troubleshoot storage over-commitment issues

Storage over-commitment can occur when using thin provisioned disks.  Because thin storage disks allow you to provision more space than what is actual available, it's possible to over commit or fill up the datastore.  Be sure to use alarms on the data store in order to alarm you when a datastore is nearing its capacity.  There are a few options that you can take if a datastore runs out of space.  You can storage vMotion certain VMs off of the datastore to free up space.  Also you can add additional space to the LUN and either increase the size of the datastore or add an extent to the datastore.  Both these options were talked about in the storage section of this guide.
 
Troubleshoot iSCSI software initiator configuration issues
 
As with any iSCSI initiators you are actually going through your network so all network troubleshooting and configuration will also apply and keep this in mind.  In addition to that there are a few other things to keep in mind when using iSCSI.
  • If using it as a boot device, the adapter is enabled automatically.  Meaning if you disable it after you have booted, it will be re-enabled next boot.
  • By default, the software adapter is disabled and needs to be activated.
  • Software (and dependent hardware) adapters utilize vmkernel networking.  Thus, you must have the proper settings configured on a vmkernel port to use the adapter properly.  You can check a ping through a vmkernel port by using vmkping -D from the command line.
  • IF you are using more than one uplink and using different vSS's, then both the IPs need to be on different IP Subnets.
  • If using multiple uplinks on one vSS, then each vmkernel port group must map to a different uplink.  
Troubleshoot Storage Reports and Storage Maps
 
Storage reports and maps can be a great tool for troubleshooting.  You can display almost every piece of information as it relates to an object (except for networking) on the storage views tab.  I'm not going to go through all of the scenarios here.  Your best bet would be to have a look at some of these reports and maps as well as read the vSphere Monitoring and Performance Guide.
 
Identify the root cause of a storage issue based on troubleshooting information
 
Again, the vSphere Troubleshooting guide is your one stop shop for this.  There isn't much value in me just copying the information in here.  Read Chapter 4 and understand what it is talking about.
 
 

2 thoughts on “VCP 5 – Objective 6.3 – Perform Basic vSphere Storage Troubleshooting

  1. Hey just wanted to help you make a change…  It isn’t VAVG it is GAVG which means it is the real latency the guest is seeing.

    Cheers,
    Ed

Leave a Reply

Your email address will not be published. Required fields are marked *