Identify general ESXi host troubleshooting guidelines

Honestly, this topic is too vague to really cover.  I would certainly recommend reading the entire troubleshooting guide which outlines some of the most common issues that you might run into.  As well, real world experience cannot be substituted when it comes to vSphere troubleshooting.  Certainly know your ways to restart the management agents ( services.sh restart and from the DCUI).  I will however provide some notes on troubleshooting those features which I have had not a lot of experience with.
 
Auto Deploy
 
Host boots with a different ESXi image, host profile or folder location and is specified.
  • Cause – After the host has been added to vCenter, the boot config is determined by vCenter.  The vCenter is the application that associates the image profile, host profile, and folder location with the hosts.
  • Solution – Use the Test-DeployRuleSetCompliance and Repair-DeployRulesetCompliance Powercli commands to re-evaluate the files and to associate the correct profiles with the host.

Host is not being redirected to the Auto Deploy Server after loading gPXE

  • Cause – the tramp file that is included in the TFTP zip file has the wrong IP for the server.
  • Solution – Fix the tramp file.

You receive a non stateless-ready package error when you try and write or modify values to an image profile.

  • Cause – Each VIB in a package has a stateless-ready flag.
  • Solution – remove the VIBS that are not stateless-ready.

Host with built in USB is not sending coredumps to the local disk.

  • Solution – Install the coredump collector on a system of your choice and use the esxcli to configure the host to use ESXi Dump Collector and disable the Local coredump partitions.

vmware-fdm warning when you assign the profile to a host

  • Cause – image does not include the fdm HA packages, which is required
  • Solution – Can ignore if not using HA, if you are, you will need to use the powercli command (Add-esxsoftware depot and add-esxsoftwarepackage) to add the vmware-fdm packages.

Host reboots after 5 minutes

  • Cause – no image profile is assigned to this host
  • Solution – Assign an image to the host.  Either temporarily (apply-esximageprofile) or permanently (new-deployrule, add-deployrule, and test-deployrulecompliance).
Troubleshoot common installation issues
 
I couldn't really find any information on this in any documentation and honestly on the 20 or so install/upgrades that I have performed I haven't ran into any issues.  So be sure that your hardware is on the HCL, meets minimum requirements, and I would probably just focus more on the actual install/upgrade sections of the blue print than this one.
 
Monitor ESXi system health
 
The ESX host monitoring tool allows you to monitor the health of a variety of host hardware including CPU, Memory, Fans, Temperature, Voltage, Power, Network, Storage, Battery, Cable, Software components, and watch dog.  It does this by gathering the data using Systems Management Architecture for Server Hardware (SMASH) profiles.  
 
Viewing health while connected to a host
 
The health status section on the configuration tab of a host will show you the status of the hardware within the host.  Generally if everything is fine you will see a green icon, if performance or functions are degraded you will see a yellow icon, and if something has failed it will be red.  If the status is blank it means that ESX is unable to gather  data from the host.
 
Viewing data from vCenter
 
 If you are connected to a vCenter then you can monitor this same hardware through the Hardware Status tab.  ***NOTE*** if you do not see the Hardware status tab ensure that the hardware status plug-in is enabled.  There are a few filters on this tab as well
  • Sensors – displays hardware sensors in a tree view.
  • Alerts and Warnings – shows only alerts and warnings
  • System Event Log – shows the system event log.  This can be cleared by clicking 'Reset Event Log'

In both cases you can reset the sensors that accumulate data over time by simply clicking 'Reset Sensors'.  There are also a few troubleshooting tips mentioned in the VMware documentation in regards to troubleshooting the Hardware Health services.

  • Hardware Status Tab isn't visible – Enable the plug-in
  • Hardware Tab displays remote name could not be resolved – Fix DNS between the client and the vCenter server or edit the extensions.xml file located at c:\Program Files\VMware\Infrastructure\Virtual Center Server\extensions\cim-ui\ and add the current vCenter Server name and IP.
  • Hardware Tab displays a security alert – Enable the Security Setting 'Allow Scripting of Internet Explorer Web Browser Control' in the intranet zone. (since Hardware status is displayed through IE).
Export diagnostic information
 
There are a couple of different ways to grab diagnostic information and generate log bundles from within the vSphere client.  I'll explain both here..
 
First Way
  1. Select the host, cluster, or datacenter in the inventory that you would like to generate the bundle for.
  2. Select File->Export->Export System Logs
  3. If you selected a cluster or datacenter, you can check or uncheck which hosts you would like to include here.
  4. Select which components you would like to include in the bundle and whether or not to gather performance data.
  5. Done.

Second Way

  1. Click Administration->Export System Logs
  2. Select the hosts you wish to export and/or vCenter.
  3. Select whether to gather performance data.
  4. DONE.