No vMotion for you! – A general system error occurred: vim.faultNotFound

vMotion is pretty awesome am I right?  Ever since I first saw my first VM migrate from one host to another without losing a beat I was pretty blown away – you always remember your first Smile  In my opinion it’s the vMotion feature that truly brought VMware to where they are today – laid the groundwork for all of the amazing features you see in the current release.  It’s something I’ve taken for granted as of late – which is why I was a little perplexed when all of a sudden, for only a few VMs, it just stopped working…

vMotionError

You can see above one of my VMs that just didn’t seem to want to budge!  Thankfully we get a very descriptive and helpful error message of “A general system error occurred: vim.faultNotFound” – you know, because that really helps a lot!  With my Google-Fu turning up no results and coming up empty handed in forum scouring I decided to take a step back to the VCP days and look at what the actual requirements of vMotion are – surely, this VM is not meeting one of them!  So with that, a simplified version of the requirements to vMotion…

  • Proper vSphere licencing
  • Compatible CPUs
  • Shared Storage (for normal vMotion)
  • vMotion portgroups on the hosts (min 1GB)
  • Sufficient Resources on target hosts
  • Same names for port groups

Licensing – check!  vCloud Suite

CPU Compatibility – check! Cluster of blades all identical

Shared Storage – check!  LUNs available on all hosts

vMotion interface – check!  Other VMs moved no problem

Sufficient Resources – check!  Lots of resources free!

Same names for port groups – check!  Using a distributed switch.

So, yeah, huh?

Since I’d already moved a couple dozen other VMs and the fact that this single VM was failing no matter what host I tried to move it to I ruled out the fact that there was anything host related causing this and focussed my attention to the single VM.  Firstly I thought maybe the VM was tied to the host somehow, using local resources of some sort – but the VM had no local storage attached to it, no CD ROMs mounted, nothing – it was the perfect candidate for vMotion but no matter what I tried I couldn’t get this VM to move!  I then turned my attention to networking – maybe there was an issue with the ports on the distributed switch, possibly having none available.

After a quick glance, there was lots of ports available, but there was another abnormality that reared its ugly head!  The VM was listed as being connected to the switch on the ‘VMs’ tab – however on the ‘Ports’ tab it was nowhere to be found!   So what port was this VM connected to?  Well, let’s ssh directly to the host to figure this one out…

To figure this out we need to run the “esxcli network vm port list” command and pass it the VMs worldID – to get that, we can simply execute the following

esxcli network vm list

From there, we can grab the world ID of our VM in question and run the following

esxcli network vm port list –w world_id

In my case, I came up with the following…

vmportid

Port 317!  Sounds normal right?  Not in my case.  In fact, I knew for certain from my documentation that the ports on this port group only went up to 309!  So, I had a VM, connected to the port group, on a port that essentially didn’t exist!

How about a TL;DR version?

Problem stemmed from the VM being connected to essentially a non-existent port!  Since I couldn’t have any downtime on this port my fix was to simply create a another port group on the dvSwitch, mimicking the settings from the first.  After attaching the VM to the newly built port group, then re-attaching back to the existing one I was finally attached to what I saw as a valid port, Port #271.

port-fixed

After doing this guess what finally started working again – that’s right, the wonderful and amazing vMotion Smile.  I’m sure you could achieve the same result by simply disconnecting and connecting, however you will experience downtime with that method – so I went the duplicate port group route.

Where there is one there’s many

All of this got me thinking – this can’t be the only VM that’s experiencing this issue is it?  I started looking around trying to find some PowerCLI scripts that I could piece together and as it turns out, knowing what the specific problem was certainly helps with the Google-Fu and I found a blog by Jason Coleman dealing with this exact same issue!  Wish I could’ve found that earlier Smile.  Anyways, Jason has  a great PowerCLI script attached to his post that peels through and detects which VMs in your environment are experiencing this exact problem!  He even has automated the creation of the temporary port groups as well!  Good work Jason!  After running it my conclusions were correct – there were about a dozen VMs that needed fixing in my environment.

How or why this occurred I have no idea – I’m just glad I found a way around it and as always, thought I’d share with intention of maybe helping others!  Also – it gave me a chance to throw in some Seinfeld action on the blog!  Thanks for reading!

VCSA 6.5 Migration deployment sizes limited!

Recently I finally bit the bullet and decided to bring the vCenter portion of a vSphere environment up to version 6.5.  Since the migration from a Windows based vCenter to the VCSA is now a supported path I thought it would also be a good time to migrate to the appliance as well.  So with that I ran through a few blogs I found in regards to the migration, checked out the vSphere Upgrade Guide and peeled through a number KB’s looking for gotchya’s.  With my knowledge in hand I headed into the migration.

At this point I had already migrated my external windows based PSC to version 6.5 and got started on the migration of the windows-based vCenter Server.  Following the wizard I was prompted for the typical SSO information along with where I would like to place the appliance.  The problem though came when I was prompted to select a deployment size for my new VCSA.  My only options available were Large and X-Large.  Might not be a big deal if in fact this environment required this amount of resources – Looking at the table below those deployment sizes are scoped to fit at a 1000 host and above mark.

DeploymentSize

Did this environment have 1000+ hosts and 10000+ VMs?  Absolutely not!  At its largest it contained maybe 70 hosts and a few hundred VMs running on them – a Small configuration at best, medium if you want to be conservative!  At first I thought maybe I was over provisioned in terms of resources on my current vCenter Server – but again, it only had 8 vCPU’s and 16GB of RAM.  With nothing out of the ordinary with vCenter itself I turned my attention to the database – and that’s where my attention stayed as it was currently sitting at a size of 200GB.  Honestly, this seemed super big to me and knowing that it had been through a number of upgrades over the years I figured I would make it my goal to shrink this down as small as possible before trying again!  TL;DR; version – The database was the culprit and I did end up with the “small” option –  but I did a number of things after a frenzy of Google’s and searches – all listed below…

WAIT!!!!  Don’t be that guy!  Make sure you have  solid backups and can restore if things here go sideways – engage VMware GSS if needed – don’t just “do what I do” 🙂

 

Reset the vpx provider

The vpx data provider basically supplies the object cache for vCenter – caching all inventory objects such as hosts, clusters, VMs, etc in order to provide that super-snappy response time in the vSphere Web Client 6.0 (Is this sarcasm?).  Anyways, resetting this essentially will reduce the size of our Inventory Database.  Now, the problem in versions prior to 5.5 Update 3 is that there was no way to reset individual data providers – in order to do one you had to do them all – and that meant losing all of your tags, storage profiles/policies, etc.  Thankfully, 5.5 U3 and 6.0 allows us to simply reset just vpx, leaving the rest of our environment in-tact.  In order to do so we must first get into the vSphere Inventory Managed Object Browser (MOB) and get the UUID of the vpx provider.  **NOTE, this is different than the MOB you may be used to logging into, see below ***

First, log into the Inventory Service MOB by pointing your browser to https://vCenterIP/invsvc/mob1/    From there, simply click the ‘RetrieveAllProviderConfigs’ link within the Methods section as shown below

invsvcprovider

In the pop up dialog, click ‘Invoke Method’, then run a search for vpx

vpxprovider

It’s the providerUuid string that we are looking for – go ahead and copy that string to your clipboard and return to https://vCenterIP/InvSvc/mob1/ – this time, clicking the ‘ResetProviderContent’ link under Methods.  In the pop up dialog, paste in your copied UUID and click ‘Invoke Method’ as shown below…

resetcontent

After a little while the window should refresh and hopefully you see no errors!   The process of resetting for myself took roughly 5 minutes to complete….

Getting rid of logs

Although vCenter does its own log rotation you may want to check out and see just how much space your logs are taking up on your current vCenter server before migrating as some of this data is processed during the migration/upgrade.  I freed up around 30GB of disk by purging some old logs – not a lot, but 30GB that didn’t need to be copied across the wire during the migration.  There is a great KB article here outlining the location and purpose of all of the vCenter Server log files – have a look at it and then peruse through your install and see what you may be able to get rid of.   For the windows version of vCenter you can find all of the logs in the %ALLUSERSPROFILE%\VMware\vCenterServer\logs\ folder.  I mostly purged anything that was gzipped and archived from most of the subfolders within this directory.  Again, not a difference maker in terms of unlocking my “Small” deployment option – but certainly a time-saver during the migration!  So what was culprit that was not allowing me to select “Small” – yeah, let’s get to that right now…

My Bloated vCenter Database

bloateddbYeah, 200GB is a little much right – even after resetting the vpx provider and shrinking the database files I was still sitting pretty high!  So, since I had no intention of migrating historical events, tasks and performance data I thought I’d look at purging it before hand!  Now if you have ever looked at the tables within your vCenter Server database you will find that VMware seems to create a lot of tables by  appending a number to the VPX_HIST_STAT table.  I had a lot of these – and going through them one by one wasn’t an option I felt like pursuing.  Thankfully, there’s a KB that provides a script to clean all of this up – you can find that here!  Go and get the MSSQL script in that KB and copy it over to your SQL Server.  Once you stop the vCenter Service we can simply run the following command via the command prompt on our SQL Server to peel through and purge our data.

sqlcmd -S IP-address-or-FQDN-of-the-database-machine\instance_name -U vCenter-Server-database-user -P password -d database-name -v TaskMaxAgeInDays=task-days -v EventMaxAgeInDays=event-days -v StatMaxAgeInDays=stat-days -i download-path\2110031_MS_SQL_task_event_stat.sql

Obviously you will need to assign some values to the parameters passed (TaskMaxAgeInDays, EventMaxAgeInDays, & StatMaxAgeInDays).  For these you have a few options.

  • -1 – skips the respective parameter and deletes no data
  • 1 or more – specifies that the data older than that amount of days will be purged
  • 0 – deletes it all!

For instance, I went with the 0, making my command look like the following….

sqlcmd -S IP-address-or-FQDN-of-the-database-machine\instance_name -U vCenter-Server-database-user -P password -d database-name -v TaskMaxAgeInDays=0 -v EventMaxAgeInDays=0 -v StatMaxAgeInDays=0 -i download-path\2110031_MS_SQL_task_event_stat.sql

After purging this data, and running a shrink on both my data and log files I finally had my vCenter database reduced in size – but only to 30GB.  Which, in all honesty still seemed a bit large to me – and after running the migration process again I still didn’t see my “Small” deployment option.   So I went looking for other large tables within the database and…..

Hello VPX_TEXT_ARRAY

It’s not very nice to meet you at all!!!  After finally getting down to this table – and running “sp_spaceused ‘VPX_TEXT_ARRAY’” I found that it was sitting a whopping 27GB.  Again, a flurry of Google!  What is VPX_TEXT_ARRAY and what data does it hold?  Can I purge it?  Well, yes….and no.  VPX_TEXT_ARRAY, from what I can gather keeps track of VM/Host/Datastore information – including information in regards to snapshots being performed on your VMs.  Also from what I can gather, from my environment anyways, is that this data exists within this table from, well, the beginning of time!  So, think about backup/replication products which constantly perform snapshots on VMs in order to protect them – yeah, this could cause that table to grow.  Also, if you are like me, and have a database that has been through a number of upgrades over the years you may end up having quite a bit of data and records within this table as it doesn’t seem to be processed in any sort of maintenance job.  In my case, 7 million records resided within VPX_TEXT_ARRAY.  Now, don’t just go and truncate that table as it most likely has current data residing in it – data vCenter needs in order to work – there’s a reason it tracks it all in the first place right?  Instead, we have to parse through the table, comparing the records with those that are in the VPX_ENTITY table, ensuring we only delete items which do not exist.  The SQL you can use to do so, below…

DELETE FROM VPX_TEXT_ARRAY
WHERE NOT EXISTS(SELECT 1 FROM VPX_ENTITY WHERE ID=VPX_TEXT_ARRAY.MO_ID)

A long and boring process – 18 hours later I was left with a mere 9000 records in my VPX_TEXT_ARRAY table.  Almost 7 Million removed.  Just a note, there is a KB outlining this information as well – in which it says to drop to SINGLE_USER mode – You can if you wish, but I simply just stopped my vCenter Server service and stayed in MULTI_USER so I could check in from time to time to ensure I was still actually removing records.  an sp_spaceused ‘VPX_TEXT_ARRAY’ in another query window will let you track just that.   Also, it might be easier, if you have the space, to set the initial size of your transaction logs something bigger than the amount of data in this table.  This allows SQL to not have to worry about growing them as it deletes records – you can always go back in the end and reset the initial size of the tlogs to 0 to shrink them.

So – a dozen coffees and a few days later I finally ran another shrink on both the data and log files, setting their initial sizes to 0 and voila – a 3GB database.  Another run at the migration and upgrade and there it was – the option to be “Small”!  Again, this worked in my environment – it may not work in yours – but it might help get you pointed in the right direction!  Do reach out if you have any questions and do ensure you have solid backups before you attempt any of this or anything you read on the net really Smile  Also, there’s always that Global Support Services thing that VMware provides if you want some help!   Thanks for reading!

Spring forward to the Toronto VMUG UserCon

Ahh Spring –  Most people describe this as a time where the rain falls and cleans everything up around us – flowers blooming, grass growing – a sign of warmth to come!  In Canada though, it’s a sign of giant muddy snow piles full of gravel, salt and sand from all of the plowing and shoveling performed all Winter long – for me, it’s a muddy white dog and two little munchkins tracking muck all over the house – All that said, there is some hope for Spring this year!  March 23rd marks the date for our next Toronto VMUG UserCon – so, if you want to escape the mud and the muck come on down to the Metro Toronto Convention Centre this Thursday and join 600+ of your peers for some great learning, technical sessions and some awesome keynotes!  We’ve got a great one planned this year and I just wanted to highlight some of the keynotes and sponsors we have lined up for Thursday!

First up – Mr. Frank Denneman

Over the years we have been lucky enough to have some awesome keynote speakers for our UserCon – this year is no exception!  I’m super excited to hear from Frank Denneman!  If you don’t know who Frank is let me try and enlighten you a little – this man literally wrote the book on DRS – three times!   The “HA and DRS/Clustering Deepdive” books – written by Frank and his co-author Duncan Epping are honestly one of the greatest tech books ever.  It’s written in a text that is easy to read, and has literally taught me so much about HA and DRS I can’t even begin to explain it all!  Certainly a must read for any VMware admin.  Frank moved on from VMware for a little while to work with PernixData as the CTO and has just recently returned to VMware taking on the role of Senior Staff Architect within their SDDCaaS Cloud Platform Business Unit.  Frank will be giving a talk titled “A Closer Look at VMware Cloud on AWS”.  With VMware and Amazon announcing a partnership recently allowing us to consume bare-metal ESXi from within the wide range of Amazon’s data centers this will most certainly be an interesting keynote explaining just how it works – and what we can expect from it in terms of unified management between our on-premises and AWS infrastructure.

The Breakouts and Panels!

After Frank the morning breakout sessions will then kick off – here we will have sessions from a variety of partners and vendors whom provide everything from hardware to storage to back up to monitoring.  You will see all of the familiar names here with 30 minute breakout sessions covering off their technologies.  Take a look at our sponsors below – without these companies these events wouldn’t be possible!    A round of sessions from VMware follows a couple of rounds of sessions from third-party vendors, then, lunch, and an aspiring/VCDX panel talk where you can be sure to get some in-depth answers to any questions you may have about design, architecture, or every day management of your VMware infrastructure.

Drinks, Food, and DiscoPosse’s

After lunch we have another couple of rounds of breakout sessions by VMware and our sponsors – with a reception following immediately thereafter.  vSphere with Operations Management will sponsor our networking reception, complete with drinks and appetizers – a perfect way to end what I’m sure will be a jam-packed day!  That said, what’s a beer without entertainment right?  We are super happy to have our own VMUG co-leader Eric Wright (@discoposse) giving our closing keynote for the day!  Think of this a little like the technology version of CBC’s Hometown Heroes segment that they offer on Hockey Night in Canada!  Eric, our own hometown hero will deliver a jam packed hour of all things VMware and Terraform, showing us just how easy it is to start automating our infrastructure with the open source software!  I got a sneak peek of this at our last local VMUG meeting and this is something you won’t want to miss!

Free Stuff!

Then, yes, of course, Giveaways!  We have some pretty cool prizes this year including cold hard cash (VISA gift cards), GoPro’s, and the ever popular grand prize of a complete vSphere Homelab!   This is on top of all the great giveaway’s we see from our sponsors!

So if you aren’t busy this Thursday, register now & drop in – we’d love to see you there!  Even if you are busy, cancel everything and come on down!  Can’t make it?  Follow along via Twitter with the hashtag #tovmug and hey, we have more meetings coming up as well to help you all get the Toronto VMUG experience.  Our Q2 meeting is May 31st sponsored by Veeam and Mid-Range and our Q3 meeting is tentative for September 19th with sponsors Zerto and Tanium (still in development) – come and check us out.  As always, stay connected.  You can follow us on Twitter, connect on LinkedIn, watch our website, or become a member of the Toronto VMUG Community in order to stay up to date on all things tovmug!  See you Thursday!

 

Don’t delay!  Register now for the March 23rd Toronto VMUG UserCon!

 

What to expect from VeeamON 2017

I’ve had the opportunity to attend both the previous VeeamON conferences in Vegas as well as the mini VeeamON forum last year in the UK and since it’s still a relatively new conference on the scene I thought I’d give everyone a bit of an overview and heads up as what to expect from the event!  Before going to far into how the event is laid out let’s first take a look at the logistics.  While I do like Vegas it tends to get a bit monotonous when it comes to conferences – making them all kind of feel like the same event.  That’s why I was ecstatic to hear that VeeamON 2017 will be held in New Orleans this year from May 16th through the 18th!  So, as Veeam embarks on its’ third VeeamON event I thought I might go over a bit on what to expect for those that may be unfamiliar with the backup vendors availability event.

Expect A LOT of technical information

With over 80 breakout sessions you can most certainly expect to learn something!    The thing about the breakouts in VeeamON though is their level of technicality.  I’ve been to many breakout sessions at other conferences that tend to be pretty marketing heavy – while VeeamON most certainly has a marketing agenda, the sessions themselves are very technical – with a 100 level being the least technical and a 400 level introducing you to things you never even knew existed!  I can honestly say that I was skeptical when attending my first VeeamON – wondering how they could have so many breakout sessions dealing solely with backup – man was I wrong!  Veeam B&R is a big application that touches a lot of different aspects of your infrastructure – think Repository best practices, proxy sizing, automation, best practices, etc.  This year with the addition of new products such as 0365 backup, Agents for Linux/Windows and the many storage integrations with partners you can bet that there will be plenty of content to be shared.

Expect a smaller, more intimate conference

VeeamON, compared to the bigger conferences is relatively small.  With roughly 2500 people in attendance last year and over 3000 expected this year the conference is not as spread out as what you may be used to – which is a good thing!  Honestly, it’s nice being able to keep everything relatively confined to the same space and even nicer to have no crazy lineups to cross the street at the Moscone.  I found that VeeamON made it very easy to find people – whether you are looking for that person or not.  Meaning, don’t be surprised to accidentally run into some Veeam executives in the hallways – or even the CEO in the elevator Smile  The atmosphere during the conference days at VeeamON is nice – not so loud that you can’t have a conversation – the solution exchange isn’t over run with vendors competing to see who has the loudest mic.  It’s a nice, low key conference which makes it easy to have those valuable hallway conversations that are usually the best benefit from any conference.

Expect to learn a little more about the “other hypervisor”

VMworld – the place you go to learn all there is to know about vSphere.  MS Ignite – the place you go to get all your Hyper-V knowledge!  VeeamON – since Veeam B&R supports both vSphere and Hyper-V you are going to hear a lot about both the hypervisors.  You’ll see your typical VMware crowd intermingling with…you know, the other guys,  all in support of the product that is protecting their infrastructure.  I’ve wrote about how the Vanguard program bridges this gap before – and the VeeamON conference is fairly similar in how it brings together the best of both the vSphere and Hyper-V worlds.  As my good friend Angelo Luciani always says “We are all in this together!”

Expect announcements!

This is a given right – every vendor organized conference is always organized around some sort of announcement or product release!  VeeamON 2014 saw the introduction to Endpoint Backup Free Edition, while VeeamON 2015 saw it’s OS counterpart announced with Veeam Backup for Linux!  All the while lifting the lid on some major enhancements and features in their core product Veeam Backup & Replication.  So what will we see this year in New Orleans – your guess is as good as mine.  Veeam just recently had a major event where they announced the evolution of the physical Windows/Linux backup products (Veeam Agent for Windows/Linux) into paid versions coupled with the Veeam Backup Console for centralized management of our endpoints – as well, we saw the release of  Veeam Backup for O365 – What else is left to announce?  I’m sure we will hear more about v10 and some top secret features from it but with all of the other new product announcements one might think there is nothing left to release – but, a wise man who worked for Veeam once told me that they have this shelf containing a lot of products and ideas – you never know when they will take something down off of it Smile

Expect to have ALL your questions answered

Veeam sends a lot of employees, engineers, tech marketing folks to this conference – and I mean A LOT.  Last VeeamON you couldn’t even walk through the Aria casino without running into at least a half dozen Veeam engineers.  What this means is, if you have questions, VeeamON is the perfect venue to ask them.  I can pretty much guarantee you that they will all be answered – there will be a SME on site dealing in the areas you are having trouble with.  So don’t just make VeeamON all about learning – try and get some of those pain points that have been bugging you for a while firmed up while at the conference.  Everyone is approachable and more than willing to give you a few minutes.

Expect an EPIC party

Sometimes you just have to let go right – If you have ever been to a Veeam party at any of the VMworlds you know that Veeam knows how to do just that!  In fact, I’ve heard more than once Veeam being described as a “Drinking company with a backup problem” Smile  I don’t quite see it as being like that but certainly you have to agree that Veeam knows how to throw a party and make you feel welcome.  Whether you are just arriving and hitting up the welcome reception or you are attending their main VeeamON party I know you will have a good time, with good food and good drinks!  Veeam understands that it can’t be all about business all the time – so take the opportunity at the parties to let a little loose and meet someone new!  I’ve made many lifelong friends doing just that!

So there you have it!  Hopefully I’ve helped paint the picture of what VeeamON is like for me and maybe helped you understand it a little more!  I’m super excited for VeeamON in New Orleans this May and I hope to see you there!

Runecast– Proactive performance for your VMware environment–Part 2 – Knowledge Base Articles, Best Practices, and Hardening Guidelines.

logoIn part 1 of our Runecast review we took a look at just how quickly we can get Runecast installed and configured within our environment.  We had a brief look at the Runecast dashboard which highlights any misconfigurations, un-applied Knowledge base articles, or non-compliant security settings.  We saw that within just a few minutes we were reporting on all this information from within our environment, and comparing that to up-to-date lists of best practices and hardening guidelines.  With KB’s, Best Practices, and Hardening Guidelines being at the heart of Runecast it’s best we take a more in-depth look at how we report on, manage, and resolve them within our environment.  That is exactly what this final part of the review will focus on.

So with all that said let’s start diving deeper into our test environment to see if we can solve any problems!  As we can see above, I currently have 38 issues that were already detected within my small little lab setup here, broken down into 5 critical, 19 major, and 14 medium.  Clicking on either severity item within the dashboard display will take us directly to a filtered view of our issues list, or we can view all issues by selecting Issues List along the left hand navigational menu.

runecastissues

By default, our issues appear rolled up – to get more information in regards the Knowledge Base Article, Best Practice or Security setting we can click the ‘+’ icon next to our issue as shown above.  As we can see here Runecast is reporting that we don’t have NTP configured on our ESXi host, falling under the Best Practice category.  Certainly time is an important thing in the world of computing so I can see why they would flag this as a critical issue.  We can also see after expanding the issue that we have a lot of other information available to us – a more descriptive issue of the problem, as well as ratings, impact, and a link to any reference material/knowledge base article, or security hardening guide to further explain or describe the issue and how to fix it.  This is very handy to have.  Right from within Runecast we can discover our issues and immediately jump into a document, user guide, or KB article outlining the problems and resolutions.

The ‘Findings’ tab within the expanded issue allows us to view the inventory objects within our environment that the issue applies to – in this case, both of our ESXi hosts.  I should note here that we do not need to first click on an issue to view it’s associated objects – we can do this in the reverse direction as well by using the Inventory item on the left hand navigation – Inventory essentially gets us to the same place, but allows us to browse through our vCenter inventory, selecting a host, cluster, datastore, vm, etc  and displaying just its’ associated issues.  Either way we get to the same information though, just a couple of routes to get there.

Another useful tab on this screen is the ‘Note’ tab.  As shown below we are able to input any notes or information that applies to this issue (or KB/Security setting for that matter) that we want.  This can be extremely useful if we have multiple people working within the Runecast environment, or even just for documentation for yourself as to why you are making or not making a certain configuration change.

runecast-issue-notes

In order to clear issues within Runecast we have a couple of options – firstly, and probably the most preferred method is to simply fix your issue – I’ve since setup NTP on my hosts and no longer see this issue being reported.  That said, as mentioned above their may be times when we have an issue present for a certain reason, especially dealing with the best practices category like the forged transmits setting above.  For this, we can simply click the ‘Ignore’ link next to an issue, create an object filter as shown below, by giving it a name and selecting the objects it applies to.

runecast-issues-ignore

After applying the filter the issue in question will no longer be reported in Runecast.  We can edit or remove this filter at any time by selecting the ‘Filter’ tab from within Runecast’s settings in order to reset anything we may want to.

From within the ‘Configuration Analysis’ section we are able to to view our issues in a different fashion.

First up KBs discovered will show us all of the KBs that have been discovered that apply to our environment.  It does this by parsing the VMware Knowledge Base and pulling down only those KBs which apply to the hardware and software versions we have running within our virtual infrastructure.    As we can see below we still have the same options as we did within the Issue List screen – we have our link out to the actual VMware KB article, the article is also embedded into Runecast, and we can add notes and choose to ‘Ignore’ certain KBs that may not apply.

runecast-kb

The ‘Best Practices’ and ‘Security Hardening’ take somewhat of a different approach as to how they are displayed.  Since best practices and security settings are actual configurations that we can choose to make in our environment they are displayed in a simple Pass/Fail fashion – passing if we meet the criteria of the practice or security setting, and fail if we do not.  This gives us the ability to quickly see thing such as “How many major items from the security guideline have we implemented”  or “Have we applied all of the ‘critical’ best practices to our environment.

runecast-bp

As we can see above we are getting a pass on our NTP settings, as we have already tackled them from the Issues screen.  We are however receiving a fail in terms of Remote TSM, which is essentially having SSH enabled on our hosts.  In my environments this is a known configuration setting, so I would most likely chose to create a filter to ignore this security setting.

The last section of Runecast I want to go over is the Log Analysis section.  Within here we can see that we have another couple of screens we can access – KBs Discovered and Verbose dashboards.  The KBs discovered section here deals solely with those KBs that specify certain patterns which are visible in the logs, such as with KB 2144934, where you can see below the “you see entries similar too…”

runecast-vmware-kb

Nobody likes searching through log files – it’s a long and tedious task.   In this situation, since we are already shipping our logs to Runecast why not let the analyzer go ahead and comb them for you.  If it finds a pattern that applies to any specific KB article, it will be flagged here.  This allows us to be quite pro-active in nature – alerting us of a KB issue that we may not even know we have.

As far as ‘Verbose Dashboards’ goes this allows us quickly get a grasp on all of the events occurring within our log files.  Again, the task of combing through log files and greping out certain items such as SCSI Aborts on the command line can be daunting, not to mention very time consuming.  Here, as shown below, we can do this directly from within the Runecast UI.

runecast-verbose

As you can see we have a lot of options to filter out the events within logs to get just the data we are looking for.  For instance we can define we only want to see those logs entries flagged as an error and applying only to a certain ESX host.  We can also define a time period of logs to parse – from predefined settings of the last 1/3/7/30 days to a custom period set up by us if we needed to audit a certain event at a certain time.   This is a very useful feature to have within the UI.  Since Runecast already has the log data in order to determine issues, why not give us a screen in order to analyze the raw data.  I can see this being super useful in terms of things such as searching for certain logins during a specific time period – something that isn’t easy to do sitting within the cli of an ESXi hosts.

Runecast really has a very nice product here and brings a lot of information out of our environment and puts it front and center in a very easy, simple, UI.  It’s so easy to setup as well – Simply deploy the ova, point it to our vCenter and right away we know how our environment stacks up in terms of best practices and security guidelines – as well as we have discovered any potential issues we may have, with all of the information on how to fix them.  All of this, in about 5 minutes.  Think about the flip-side of this, downloading best practices and the hardening guide and going through each line item one by one, looking up build numbers and then searching through mountains of VMware KB’s – not something I want to do.  While other  products providing some similar functionality such as vROPs and Log Insight may bring us more metrics, Runecast instead displays only what we need to see to properly troubleshoot our environment, keeping the UI clean and crisp and easy to use – aside from that, when compared to vROPs, Runecast doesn’t come with the install footprint, nor the price tag, and as far as I know is the only product on the market which parses and filters out VMware KBs for us.   As far as development goes Runecast isn’t holding back, with a beta version set to be released soon we can see features such as multitenancy being added to the product – as well as a few more undisclosed features set to be released in Q1/Q2 of this year.  Runecast comes with a fully featured, free 30 day trial but honestly the product gives you valuable information in the first 15 minutes –  so 30 days is more than long enough to get your environment up to snuff.   That said, in order to keep your environment running at it’s peak performance you will want to consult Runecast often as we all know how fast Best Practices and Security guidelines can change in our industry.  Runecast automatically adjusts to these changes – ensuring your environment is ALWAYS compliant.   The amount of time Runecast saves you is instantly recognized, and the fact that they are constantly connected to the VMware knowledge base and hardening guides means you are always “in the know” about how your environment is configured according the “preferred” way – even if your environment changes, or the “preferred” way changes!  If you want to try out Runecast and what it has to offer for yourself you can do so by signing up for their 30 day trial! I guarantee you will find something in need of some attention in your environment!

Runecast – Proactive performance for your VMware environment! – Part 1 – Configuration

Have you ever opened up the VMware Hardening Guide and checked your environment against every single item listed?  How about combed through the VMware Knowledge Base looking for all KB articles that apply to the exact software builds and hardware you have?  No?  How about taken a list of industry best practices and ensured that you are indeed configured in the best possible way?  Of course we haven’t – that would certainly take a lot of time and most organizations simply don’t have the resources to throw at those types of tasks.  All that said what if I told you that there was a piece of software that could pretty much instantly tell you whether you are or are not compliant in those exact three scenarios?  Interested yet?  I thought you might be…

Enter Runecast

logoBefore writing this review I’d never heard of Runecast, so first, a little bit about the company.  Runecast was founded in 2014 in the quaint ol’ city of London in the UK.  Their goal, to provide pro-active monitoring to our vSphere environments in order to save us time, prevent outages before they  happen, ensure compliance at all times and simply make our environments more secure.  Now there is only four things listed there – but they are four things that Runecast does really, really well.  With that said, I could talk about how much I enjoyed doing this review forever, but it’s best just to jump right in and get monitoring…

Configuration

runecast-addvcenterAs far as installation goes Runecast come bundled as a virtual appliance, so it’s just a matter of deploying the analyzer into our environment.  To help you get started Runecast offers a 30 day full-featured free trial that you can try out!  Configuration wise we really only have a couple of steps to perform; pointing the Runecast Analyzer at our vCenter Server and configuring our ESXi hosts to forward their logs.  After deployment you should be brought to a screen similar to the one shown to the left.  Simply follow the ‘Settings’ link and enter in your required vCenter Server information into Runecast as shown below.

runecast-vcenteradditiondetails

Remember how we mentioned that configuration is divided into two steps.  The first, connecting to our vCenter environment is now complete.  The second, setting up the forwarding of logs is completely optional and can be completed at any time.  We can still get valuable data from Runecast without having log forwarding set up, however in order to achieve a more holistic view of our environment we will continue to setup log forwarding.

There are many ways to setup our ESXi hosts to send their logs to Runecast.  We can set them up manually, use some a PowerCLI script, or enter the Runecast Analyzer information into our Host Profile.  The Runecast interface has the smarts to configure this for us as well.  This review will follow the steps in order to setup log forwarding from within the Runecast Analyzer UI.

Selecting the “Status” section from the Log Analysis group, and then clicking on the ‘wrench’ icon will allow us to configure one or many of our hosts to send their log files to Runecast.  This process provides the same results as if we were to go and set the syslog advanced setting directly on the hosts configuration. That said, utilizing Runecast for this seems like a much more automated and easier process.   As you can see below, we also have the option to send our VM log files as well which is a good idea if you are looking for complete visibility into your virtualization stack.

runecast-logging

As far as configuration goes we are now done!  That’s it!.  2 simple steps and we are ready to start detecting problems within our environment.  The process of going out and collecting data from our vCenter Server is called ‘Analyze’ within Runecast.  Our analysis can be configured to occur via a schedule by navigating to the settings page (gear icon in top right) or can be run on-demand by clicking the ‘Analyze Now’ button from any screen within the application.

runecast-analyze

How long this process takes greatly depends on the size of your environment.  My test environment, be it simple and small, only took a couple of minutes to gather the data.  I’m sure this time would increase in a 32 host cluster with 1000 or so VMs though.    That said, for the amount of data it gathers and the amount of comparisons going on behind the scenes Runecast does a very efficient job at processing everything.

Navigating back to the ‘Dashboard’ as shown below immediately let’s us start to explore the results of this analysis process.  Almost instantaneously we can see many issues and best practices that can be applied within our environment.  As you can see below I had a number of issues discovered – and I’ve only had Runecast up and running for less than 5 minutes.

runecast-dashboard

Runecast Terminology

Lets take a minute and dig a little into the data that is displayed on the ‘Dashboard’ screen.  Mostly everything that Runecast monitors and does is rolled up here, giving us an at-a-glance view of everything you need to know.  Let’s break down the items that we are seeing here…

Issues – The term “issue” within Runecast basically represents a detected problem in our infrastructure – this can come from any single or combined instance of configuration settings, log file analysis, or software and hardware versions.  Although the source of discovering issues could be from configuration settings or log files, all issues belong to one of three categories within Runecast; Knowledge Base articles, Security Guidelines, or Best Practices, explained below…

KB’s – Runecast actively piles through the vast amounts of VMware Knowledge Base articles and displays to us any that may apply to our environment based on the hardware and software versions and configuration we are running.

Best Practices – All of our inventory objects and configuration items are routinely scanned to determine whether or not they meet any best practices related to VMware.  This allows us to see if we simply Pass or Fail in terms having our environment running in it’s best possible configuration.

Security Compliance – Security Compliance takes all of the items within the official VMware Security Hardening guides and compares that to of the configuration of our infrastructure.  At a glance we are able to see how we stack up against the recommended security practices provided by VMware.

It’s these four items; Issues, KB’s, Best Practices, and Security Compliance that are at the core of the Runecast analytical engine.  Runecast automatically combs through all of these items and determines which ones apply to our environment, then reports back in a slick clean UI, allowing us to see whether we are in compliance or not!  In the next part of our review we will go into each of these items in a lot more detail – explaining how to drill down, resolve, and exclude certain metrics from our dashboards.  For now , I certainly recommend checking out Runecast for yourself – as you saw, it’s a simple install that can be up and running in your environment very quickly.  So, while you wait for part 2 of the review head on over to the Runecast page and grab yourself a free 30 day trial  to start reporting on your environment.  I’m sure you will be surprised at all of the abnormalities and non-compliant configurations you find right off the hop – I know I was!  Stay tuned for part 2.