We initially installed Log Insight purely for our audit (where we must show that we monitor logs, especially for failed logins). Lately we have found more and more great things about having Log Insight that has helped us, including saving a host from the brink of death.
Our host are currently 5.0u2 with plans to upgrade over the next few months. Most hosts have been up for almost 18 months. Right before VMworld we had a host become disconnected in a cluster. The host was still pinging and the guests were still running. We could not access the host via the dcui (password prompt would not show up).
So how do you troubleshoot without access to the console? Well thankfully the system was still logging via syslog to Log Insight.
In Log Insight I saw this error quite a few times:
2014-08-21T20:59:45.963Z <ScrubbedHostname> vmkernel: cpu29:5571)WARNING: VisorFSObj: 893: Cannot create file /var/log/ipmi/0/.sensor_threshold.raw for process sfcb-vmware_raw because the visorfs inode table is full.
A quick google search for the error brought up this KB:ESXi 5.x host is disconnected from vCenter Server due to sfcbd exhausting inodes
The KB had the issue (sfcdb service filling up /var/run/sfcb directory and exhausting inodes), and the fix, but you needed ssh or console access in order to do the fixes. The only other option at this point was to gracefully shut down the guests and then hard boot the host.
I kept searching and I found this community post:
The poster in the thread had the same issue, but found that as he shut down guests on the host the host re-connected to vCenter. At that point he could manage the host and apply the fix from the KB.
I immediately looked for a non-critical VM and found a non-prod terminal server that on-one was logged into. Once I shut down that guest, I saw the login prompt pop up in the console. I turned on the local shell and proceeded to clean up the sfcb directory and bounce the management services. Now everything was back in action with minimal impact.
The poster from the forum wrote a cool PowerCLI script to monitor iNodes and a few other things
But the issue is fixed in 5.0u3. 5.1u2 and 5.5 (with a certain patch) so it’s not really something you would need to run all the time (just once in a while until you upgrade).
Without Log Insight, I wouldn’t have been able to see the error messages, which would have resulted in an outage on all of the VMs on that host, so +1 for Log Insight.
Pingback: Log Insight and vCenter Operations – Better Together with vRealize Operations Insight | VMware Cloud Management - VMware Blogs
Pingback: Office of the CTO | Log Insight and vCenter Operations – Better Together with vRealize Operations Insight