Test Connection Request while adding an endpoint fails with an exception
Unable to connect to endpoint. Timeout occurred while processing the request
Even though we verified firewall , ports , vCenter Services status still we were still encountering this exception.
When we were looking into the logs
For a working environment
Manager Service \ All.log
As seen below, when a Test Connection is initiated from vRA UI , Manager Service is responsible for initiating this request and it creates a workitem to be processed by underlying vSphereAgent
[UTC:2018-11-05 09:25:45 Local:2018-11-05 09:25:45] [Debug]: [sub-thread-Id="67" context="gjbLTAc0" token="gx3pVepj"] TestConnection Request: Name [vc.nukescloud.com], Address: [https://vc.nukescloud.com/sdk], Username: [nukescloud\nukes]
[UTC:2018-11-05 09:25:48 Local:2018-11-05 09:25:48] [Debug]: [sub-thread-Id="25" context="" token=""] TestConnection WorkItemResponse: [<?xml version="1.0" encoding="utf-16"?><workItemResponse id="9bca4157-1fd3-4e19-a7dd-117aacc5e5d6"><task type="testconnection"><result succeeded="false"><message>Test connection failed: Certificate is not trusted (RemoteCertificateChainErrors). Subject: C=US, CN=vc.nukescloud.com Thumbprint: 9C949FAEB87CBF97419CF4BFE70EB3FB8A035173</message></result><parametername="TestConnection.Endpoint.Request.ResultCode">INVALID_CERTIFICATE</parameter><parameter name="trace_id" /></task></workItemResponse>]
vSphereAgent.log
Below snippet is from vSphereAgent.log as you can see , once the workitem is created by Manager Service , vSphereAgent will process the workitem and fetch the certificate from endpoint.
2018-11-05T09:25:48.218Z IAAS75 vcac: [component="iaas:VRMAgent.exe" priority="Info" thread="4332"] [sub-thread-Id="5" context="" token=""] Starting : Processing Workitem ID [9bca4157-1fd3-4e19-a7dd-117aacc5e5d6] [testconnection]
2018-11-05T09:25:48.218Z IAAS75 vcac: [component="iaas:VRMAgent.exe" priority="Debug" thread="4332"] [sub-thread-Id="5" context="" token=""] [[]] [testconnection] TestConnection.Endpoint.Address=https://vc.nukescloud.com/sdk
2018-11-05T09:25:48.218Z IAAS75 vcac: [component="iaas:VRMAgent.exe" priority="Debug" thread="4332"] [sub-thread-Id="5" context="" token=""] [[]] [testconnection] TestConnection.Endpoint.Username=nukescloud\nukes
2018-11-05T09:25:48.218Z IAAS75 vcac: [component="iaas:VRMAgent.exe" priority="Debug" thread="4332"] [sub-thread-Id="5" context="" token=""] [[]] [testconnection] TestConnection.Endpoint.TrustThumbprint=
2018-11-05T09:25:48.218Z IAAS75 vcac: [component="iaas:VRMAgent.exe" priority="Debug" thread="4332"] [sub-thread-Id="5" context="" token=""] [[]] [testconnection] TestConnection.Endpoint.TrustAllCertificates=False
2018-11-05T09:25:48.218Z IAAS75 vcac: [component="iaas:VRMAgent.exe" priority="Debug" thread="4332"] [sub-thread-Id="5" context="" token=""] Begin test connection request....
2018-11-05T09:25:48.234Z IAAS75 vcac: [component="iaas:VRMAgent.exe" priority="Warning" thread="5364"] [sub-thread-Id="10" context="" token=""] Invalid certificate found: C=US, CN=vc.nukescloud.com, Untrusted certificate chain
2018-11-05T09:25:48.234Z IAAS75 vcac: [component="iaas:VRMAgent.exe" priority="Warning" thread="5364"] [sub-thread-Id="10" context="" token=""] Invalid certificate found: C=US, CN=vc.nukescloud.com, Untrusted certificate chain
2018-11-05T09:25:48.249Z IAAS75 vcac: [component="iaas:VRMAgent.exe" priority="Error" thread="4332"] [sub-thread-Id="5" context="" token=""] <?xml version="1.0" encoding="utf-16"?>
<boolean>false</boolean>
System.AggregateException: One or more errors occurred. ---> System.Net.WebException: The underlying connection was closed: An unexpected error occurred on a send. ---> DynamicOps.Common.Client.UntrustedCertificateException: Certificate is not trusted (RemoteCertificateChainErrors). Subject: C=US, CN=vc.nukescloud.com Thumbprint: 9C949FAEB87CBF97419CF4BFE70EB3FB8A035173
Above "Certificate is not trusted" message is not an exception but a message which states that certificate of the vCenter is not present in vRA yet.
For non-working environment
Manager Service \ All.log
For the environment where we see exception , We so see test connection request being processed but there is no workitem generated for agent to go ahead and fetch certificate
[UTC:2018-10-26 06:56:46 Local:2018-10-26 17:56:46] [Trace]: [sub-thread-Id="15" context="fW8t555E" token="3wP5GoMr"] TestConnection Request: Begin creating request record for request id [d2b48795-bdd2-4f9a-b585-2f8be4e33ac9]
[UTC:2018-10-26 06:56:46 Local:2018-10-26 17:56:46] [Trace]: [sub-thread-Id="15" context="fW8t555E" token="3wP5GoMr"] TestConnection Request: Finish creating request record for request id [d2b48795-bdd2-4f9a-b585-2f8be4e33ac9]
[UTC:2018-10-26 06:56:46 Local:2018-10-26 17:56:46] [Debug]: [sub-thread-Id="15" context="fW8t555E" token="3wP5GoMr"] TestConnection Request: Name [vcenter.nukescloud.com], Address: [https://vcenter.nukescloud.com/sdk], Username: [nukescloud\nukes]
Apart from above three lines , there was no other information in the logs to tell us why and where it's stuck.
After little bit of research found that dbo.Agent table found that there were multiple entries under AgentName for the same endpoint on the same Agent Machine , Refer the screenshot below.
To come out of this problem , we followed these steps.
Firstly , Backup IAAS database
Then RDP to IAAS Agent nodes and then uninstall proxy agent for the endpoint which cannot be added
Once , agent's are uninstalled , browse to the previously installed location and remove any existing traces ( there might be fe log files left )
Now , get back to SQL Database and then execute following query to remove any traces of the endpoint where it's failing to add from dbo.Agent table
In my scenario
vCenter was vcenter.nukescloud.com
dbo.Agent table had following AgentID's for my endpoint vcenter.nukescloud.com
AAAAA , CCCCC and DDDDD
Now , we need to go ahead and delete these entries from Database, we may use below SQL query to do this
delete from dbo.Agent where AgentID in ('AAAAA','CCCCC','DDDDD');
Post deletion , go ahead and reinstall the agent
Once done your expected to see one entry for this endpoint and the agent in dbo.Agent table and it's AgentAlive value would be set to 1
As you can see above if it is not working it's AgentAlive status would be set to 0
Now adding endpoint through vRA UI , it should work as expected
Reason why this exception occurs is very simple , when an agent is removed , it does not remove it from IaaS database. When we add an endpoint now , it always looks for the top entry for the agent in the table to assign a workitem. So it hit's the one which is not a proper proxy agent for this endpoint it would not be able to process it further causing a timeout in the UI
So when Proxy Agent is uninstalled , ensure you perform a check in dbo.Agent table under IAAS database if it's completely removed and there isn't any stale entry
|| Happy Learning ||