I recently had to change my AD password for my admin credentials, and of course I waited until a Friday at 3pm to do it.
Immediately after changing my password I started to get locked out very rapidly. I had my colleague unlock my account and we started to try to determine where it was happening.
I first went to all of the places where I was lazy and used my credentials to bind to vCenter or to AD. This included my two Log Insight instances in our Lab (both vSphere Integration and Authentication). I switched both of those to a service account but I was still having issues.
I then had my coworker check if I had any RDP sessions on the Windows servers I typically use (jump servers, vCenter, SRM, utility servers). No dice there.
I remembered from a few years ago that someone in the company wrote a simple program that scanned all of the domain controllers and told you if that account was disabled/locked there, how many bad password were submitted and when the last one was. All of the bad passwords were occurring on two domain controllers in the same site (I will call this site A)
The only thing at that site I could think of was possibly a new SRM installation. I re-installed SRM using a service account and ensured that the sites were connected not using my credentials. Still no dice.
At this point I needed some help. Our AD team sends the security logs to QRadar so they were able to tell me the exact server that was locking me out. The server that was locking me out was the SSO server for the vCenter in site A. Progress!!!
I logged into the SSO server and looked at the following log:
2014-11-17 16:26:10,350 INFO [IdentityManager] Authentication failed for user [mycreds@domain] in tenant [vsphere.local] in  milliseconds
That was super informative.
I then went to the vCenter and looked at the vpxd.log and saw this:
2014-11-17T16:06:00.351-06:00 [07992 info ‘[SSO]’ opID=1b1c80cb] [UserDirectorySso] Authenticate(domain\mycreds, “not shown”)
2014-11-17T16:06:00.382-06:00 [07992 error ‘[SSO]’ opID=1b1c80cb] [UserDirectorySso] AcquireToken exception: class SsoClient::InvalidCredentialsException(Authentication failed: Invalid credentials)
2014-11-17T16:06:00.382-06:00 [07992 error ‘authvpxdUser’ opID=1b1c80cb] Failed to authenticate user <domain\mycreds>
Again, super informative.
I found this post by William Lam in the forums where he talks about tracing back a rogue logon: https://communities.vmware.com/thread/296871
He also has a full blog post: http://www.virtuallyghetto.com/2010/12/how-to-identify-origin-of-vsphere-login.html
Unfortunately I don’t believe that his steps 1) apply to vCenter with SSO 2) really deal with a failed login
One of the steps he mentions is to use netstat to see what servers have connections to the vCenter. This worked OK, I did start digging in and looking at each server. What I started to do was to eliminate all of the servers that were connected via 443, this included the SRM servers and some other ones. At the end, I eliminated everything except the vCenter itself, which I knew wasn’t running my credentials.
At that moment I noticed that there was a Netapp icon on the desktop for the VSC…
I know I installed the VSC earlier in the year, and it may have been registered with my credentials, though I had been through 2-3 password changes since. I didn’t use the VSC for this environment so I uninstalled it. I looked at the vCenter and SSO logs, and no more failed logins! I then had my AD account unlocked and it has stayed unlocked with no bad passwords.
The main lesson learned it to always use a service account to run stuff, join stuff, register stuff etc. Also, use a separate account per application, and possibly per application instance so one password change doesn’t take down everything. Having the right tools helps a lot as well, QRadar narrowed things down immensely. At that point, eliminate eliminate elminate and don’t dismiss the obvious answers.
Note: I could have also tried to re-register the VSC by going to:
on the server where the VSC was installed (in this case the vCenter).