vMotion is pretty awesome am I right? Ever since I first saw my first VM migrate from one host to another without losing a beat I was pretty blown away – you always remember your first In my opinion it’s the vMotion feature that truly brought VMware to where they are today – laid the groundwork for all of the amazing features you see in the current release. It’s something I’ve taken for granted as of late – which is why I was a little perplexed when all of a sudden, for only a few VMs, it just stopped working…
You can see above one of my VMs that just didn’t seem to want to budge! Thankfully we get a very descriptive and helpful error message of “A general system error occurred: vim.faultNotFound” – you know, because that really helps a lot! With my Google-Fu turning up no results and coming up empty handed in forum scouring I decided to take a step back to the VCP days and look at what the actual requirements of vMotion are – surely, this VM is not meeting one of them! So with that, a simplified version of the requirements to vMotion…
- Proper vSphere licencing
- Compatible CPUs
- Shared Storage (for normal vMotion)
- vMotion portgroups on the hosts (min 1GB)
- Sufficient Resources on target hosts
- Same names for port groups
Licensing – check! vCloud Suite
CPU Compatibility – check! Cluster of blades all identical
Shared Storage – check! LUNs available on all hosts
vMotion interface – check! Other VMs moved no problem
Sufficient Resources – check! Lots of resources free!
Same names for port groups – check! Using a distributed switch.
So, yeah, huh?
Since I’d already moved a couple dozen other VMs and the fact that this single VM was failing no matter what host I tried to move it to I ruled out the fact that there was anything host related causing this and focussed my attention to the single VM. Firstly I thought maybe the VM was tied to the host somehow, using local resources of some sort – but the VM had no local storage attached to it, no CD ROMs mounted, nothing – it was the perfect candidate for vMotion but no matter what I tried I couldn’t get this VM to move! I then turned my attention to networking – maybe there was an issue with the ports on the distributed switch, possibly having none available.
After a quick glance, there was lots of ports available, but there was another abnormality that reared its ugly head! The VM was listed as being connected to the switch on the ‘VMs’ tab – however on the ‘Ports’ tab it was nowhere to be found! So what port was this VM connected to? Well, let’s ssh directly to the host to figure this one out…
To figure this out we need to run the “esxcli network vm port list” command and pass it the VMs worldID – to get that, we can simply execute the following
esxcli network vm list
From there, we can grab the world ID of our VM in question and run the following
esxcli network vm port list –w world_id
In my case, I came up with the following…
Port 317! Sounds normal right? Not in my case. In fact, I knew for certain from my documentation that the ports on this port group only went up to 309! So, I had a VM, connected to the port group, on a port that essentially didn’t exist!
How about a TL;DR version?
Problem stemmed from the VM being connected to essentially a non-existent port! Since I couldn’t have any downtime on this port my fix was to simply create a another port group on the dvSwitch, mimicking the settings from the first. After attaching the VM to the newly built port group, then re-attaching back to the existing one I was finally attached to what I saw as a valid port, Port #271.
After doing this guess what finally started working again – that’s right, the wonderful and amazing vMotion . I’m sure you could achieve the same result by simply disconnecting and connecting, however you will experience downtime with that method – so I went the duplicate port group route.
Where there is one there’s many
All of this got me thinking – this can’t be the only VM that’s experiencing this issue is it? I started looking around trying to find some PowerCLI scripts that I could piece together and as it turns out, knowing what the specific problem was certainly helps with the Google-Fu and I found a blog by Jason Coleman dealing with this exact same issue! Wish I could’ve found that earlier . Anyways, Jason has a great PowerCLI script attached to his post that peels through and detects which VMs in your environment are experiencing this exact problem! He even has automated the creation of the temporary port groups as well! Good work Jason! After running it my conclusions were correct – there were about a dozen VMs that needed fixing in my environment.
How or why this occurred I have no idea – I’m just glad I found a way around it and as always, thought I’d share with intention of maybe helping others! Also – it gave me a chance to throw in some Seinfeld action on the blog! Thanks for reading!