This is a landing page for all of the various queries and alerts that I have found useful during my day-to-day operations.
If anyone has other alerts that they have found useful, them let me know, I’ll add them here.
-Query String: bootbank cannot be found “issue detected on”
-Comments: The Bad Bootbank is a definite sign of something wrong with the local media. I go to maintenance mode right away and swap the (in my case) sd card.
SCSI Errors on SD Card
-Query String: nmpdeviceattemptfailover mpx.vmhba* APD
-Notes: Typically indicates a failing SD card
-Comments: My hosts use SD cards with no other local storage. If you utilize local disk then this query may match an error for those disks. I have seen a host throw up these errors prior to the bad bootbank and I have seen them go straight to bad bootbank. Probably depends on what kind of error the SD card has.
-Query String: HostStateChange::SaveToInventory changed state dead
-Notes: A host is no longer talking to it’s cluster. The hostname in the text of the message is a guid, look at the hostname that reported the error and look for that hosts’s cluster to determine the dead host.
Log Insight being Spammed by ipmiifcselreadentry Entries
-query String: ipmiifcselreadentry
-Notes: Log Insight is getting flooded by ipmiifcselreadentry entries
Look a the hosts that are sending these entries and do the following:
Logon via SSH
localcli hardware ipmi sel clear