I recently had to change my AD password for my admin credentials, and of course I waited until a Friday at 3pm to do it.
Immediately after changing my password I started to get locked out very rapidly. I had my colleague unlock my account and we started to try to determine where it was happening.
I first went to all of the places where I was lazy and used my credentials to bind to vCenter or to AD. This included my two Log Insight instances in our Lab (both vSphere Integration and Authentication). I switched both of those to a service account but I was still having issues.
I then had my coworker check if I had any RDP sessions on the Windows servers I typically use (jump servers, vCenter, SRM, utility servers). No dice there.
I remembered from a few years ago that someone in the company wrote a simple program that scanned all of the domain controllers and told you if that account was disabled/locked there, how many bad password were submitted and when the last one was. All of the bad passwords were occurring on two domain controllers in the same site (I will call this site A)
The only thing at that site I could think of was possibly a new SRM installation. I re-installed SRM using a service account and ensured that the sites were connected not using my credentials. Still no dice.
At this point I needed some help. Our AD team sends the security logs to QRadar so they were able to tell me the exact server that was locking me out. The server that was locking me out was the SSO server for the vCenter in site A. Progress!!!
I logged into the SSO server and looked at the following log:
C:\ProgramData\VMware\CIS\logs\vmware-sso\vmware-sts-idmd.log
2014-11-17 16:26:10,350 INFO [IdentityManager] Authentication failed for user [mycreds@domain] in tenant [vsphere.local] in [10] milliseconds
That was super informative.
I then went to the vCenter and looked at the vpxd.log and saw this:
2014-11-17T16:06:00.351-06:00 [07992 info ‘[SSO]’ opID=1b1c80cb] [UserDirectorySso] Authenticate(domain\mycreds, “not shown”)
2014-11-17T16:06:00.382-06:00 [07992 error ‘[SSO]’ opID=1b1c80cb] [UserDirectorySso] AcquireToken exception: class SsoClient::InvalidCredentialsException(Authentication failed: Invalid credentials)
2014-11-17T16:06:00.382-06:00 [07992 error ‘authvpxdUser’ opID=1b1c80cb] Failed to authenticate user <domain\mycreds>
Again, super informative.
I found this post by William Lam in the forums where he talks about tracing back a rogue logon: https://communities.vmware.com/thread/296871
He also has a full blog post: http://www.virtuallyghetto.com/2010/12/how-to-identify-origin-of-vsphere-login.html
Unfortunately I don’t believe that his steps 1) apply to vCenter with SSO 2) really deal with a failed login
One of the steps he mentions is to use netstat to see what servers have connections to the vCenter. This worked OK, I did start digging in and looking at each server. What I started to do was to eliminate all of the servers that were connected via 443, this included the SRM servers and some other ones. At the end, I eliminated everything except the vCenter itself, which I knew wasn’t running my credentials.
At that moment I noticed that there was a Netapp icon on the desktop for the VSC…
I know I installed the VSC earlier in the year, and it may have been registered with my credentials, though I had been through 2-3 password changes since. I didn’t use the VSC for this environment so I uninstalled it. I looked at the vCenter and SSO logs, and no more failed logins! I then had my AD account unlocked and it has stayed unlocked with no bad passwords.
The main lesson learned it to always use a service account to run stuff, join stuff, register stuff etc. Also, use a separate account per application, and possibly per application instance so one password change doesn’t take down everything. Having the right tools helps a lot as well, QRadar narrowed things down immensely. At that point, eliminate eliminate elminate and don’t dismiss the obvious answers.
Note: I could have also tried to re-register the VSC by going to:
https://localhost:8143/Register.html
on the server where the VSC was installed (in this case the vCenter).
Pingback: Newsletter: November 29, 2014 | Notes from MWhite
Thanks for this, Chris! I had the same exact issue…VSC was setup by a previous employee and I was going nuts trying to find the culprit!
Glad it helped!
I had the same issue, found a lot of useful information but none that resolved the issue. Disabling the Plugin didn’t help either. Re-registering with a service account fixed it. thank you Chris.
Glad it helped!
Pingback: Finding Location of Failed vCenter Login Part 2 | Virtual Chris
More than two years after posting, your article remains useful. We had this issue yesterday and I was pulling my hair out trying to find the culprit. For resolution, we took the re-registration route for now; we’ll be registering with a password-static service-account eventually.
Have you found out exactly why the application behaved in this manner? The person who installed it said he only installed it; he didn’t know he was registered.
In any event, a good call on your part in suspecting the NetApp plug-in. Thank you.
Glad it helped! I think people install the plugin using a set of creds not realizing that the creds are used for connection. Would be better if this registered as a solution and created a sso user.