top of page

Search Results

237 results found with an empty search

  • Shared swap vMotion of a fully reserved VM with swap file fails due to failure to extend swap file

    I was trying to migrate a virtual machine which had around 60 GB memory to a host which had no virtual machines registered on it , failed with below exception hostd.log 2019-03-28T05:17:28.943Z info hostd[11B81B70] [Originator@6876 sub=Vcsvc.VMotionDst (2076941261877375970)] ResolveCb: VMX reports needsUnregister = true for migrateType MIGRATE_TYPE_VMOTION 2019-03-28T05:17:28.943Z info hostd[11B81B70] [Originator@6876 sub=Vcsvc.VMotionDst (2076941261877375970)] ResolveCb: Failed with fault: (vim.fault.GenericVmConfigFault) { --> faultCause = (vmodl.MethodFault) null, --> faultMessage = (vmodl.LocalizableMessage) [ --> (vmodl.LocalizableMessage) { --> key = "msg.checkpoint.destination.resume.fail", --> arg = (vmodl.KeyAnyValue) [ --> (vmodl.KeyAnyValue) { --> key = "1", --> value = "msg.vmk.status.VMK_MEM_ADMIT_FAILED" --> } --> ], --> message = "Failed to resume destination VM: Admission check failed for memory resource. --> " --> }, --> (vmodl.LocalizableMessage) { --> key = "vob.vmotion.swap.extend.failed.status", --> arg = (vmodl.KeyAnyValue) [ --> (vmodl.KeyAnyValue) { --> key = "1", --> value = "-1407197683" --> }, --> (vmodl.KeyAnyValue) { --> key = "2", --> value = "2076941261877375970" --> }, --> (vmodl.KeyAnyValue) { --> key = "3", --> value = "536870912" --> }, --> (vmodl.KeyAnyValue) { --> key = "4", --> value = "Admission check failed for memory resource" --> } --> ], --> message = "vMotion migration [ac1fde0d:2076941261877375970] failed to extend swap file to 536870912 KB: Admission check failed for memory resource. --> " --> } --> ], --> reason = "Failed to resume destination VM: Admission check failed for memory resource. --> " --> msg = "Failed to resume destination VM: Admission check failed for memory resource. --> vMotion migration [ac1fde0d:2076941261877375970] failed to extend swap file to 536870912 KB: Admission check failed for memory resource. This Virtual Machine had 64 GB of Memory which is fully reserved and Memory hot plug enabled Failure was related to swap file growth as it was unable to expand. In an ideal scenario , when a VM has 100% Memory reservation , swap file of that VM should be 0 KB. But in my scenario swap file was same as the size of memory assigned to the VM. This was not a normal situation. The reason you still see a swap file same as the size of memory assigned to this VM os because memory reservation was made while the VM was powered on You must never reserve memory of a virtual machine while it's powered on. It's not a best practice. Now , let me explain exception in detail. This is a bug identified in version 6.0 and 6.5 , even earlier version if someone's using it. Below conditions have to met to encounter this bug VM is fully reserved after it's powered on VM must be configured with more than 56 GB of Memory VM must have Memory hot-plug enabled Happens only on DRS clusters This behavior is fixed in version 6.7 due to the changes made in the code , but not in 6.5 & 6.0 There is no workaround as we cannot delete the swap file while the VM is powered on. Only was to fix is take a proper downtime of virtual machine , shut it down and bring it back. Swap file should be reset to 0 KB. Post that vMotion should work. Here's a small video recording where i reproduced the bug , if you would like to watch it !! Happy Learning !!

  • Unable to delete a vRO endpoint

    Are you trying to delete a vRO endpoint and is it throwing an exception Error Unable to delete a vCO endpoint of type 'AD'. Reason 'TypeError: Cannot read property "id" from undefined (Workflow:Remove an Active Directory server / Scriptable task (item1)#2)' In our case it was an Active Directory endpoint which was throwing exceptions Endpoints created on this pane are vRO endpoints and are stored under a table called public.asd_endpoint If there is discrepancy in id then it would not allow your to delete endpoint from UI. In that scenario , removing this one from database is the only option Before removing entry from database you have to make sure your removing a right one. Compare "Name" & "rootobjectid" are same on both DB and UI which would give you a clue. Deletion from database ( Ensure Postgres Database backup is taken before you start off ) 1. Login into vRA's Postgres database su - postgres /opt/vmware/vpostgres/current/bin/psql vcac 2. Enable extended display \x 3. Capture ID from public.asd_endpoint for the endpoint you want to remove select id from public.asd_endpoint where name = 'nukescloud'; 4. Using above ID for the endpoint go ahead and execute delete statement delete from public.asd_endpoint where id = '45702e71-3549-410c-95b3-993b77750e49'; 5. Once done , when you refresh the page in UI , you would not see the endpoint anymore

  • Reconfigure actions on a Managed VM when triggered using API / Powershell / Workflow fails

    When you trigger a reconfigure request using API / Powershell / Custom Workflow on an environment which was recently patched to any of the released vRealize Automation 7.4 patches , they would fail Exception would be as follows Error Message: [Error code: 42300 ] - [Error Msg: Infrastructure service provider error: A server error was encountered. Value cannot be null. Parameter name: value] dynamicops.api.client.ApiClientException at dynamicops.api.client.ClientResponseHandler.handleResponse(BaseHttpClient.java:316) ~[iaas-api-client-7.4.0-SNAPSHOT.jar:?] at dynamicops.api.client.BaseHttpClient$1.handleResponse(BaseHttpClient.java:164) ~[iaas-api-client-7.4.0-SNAPSHOT.jar:?] at org.apache.http.client.fluent.Response.handleResponse(Response.java:90) ~[fluent-hc-4.5.5.jar:4.5.5] at org.apache.http.client.fluent.Async$ExecRunnable.run(Async.java:81) [fluent-hc-4.5.5.jar:4.5.5] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161] Though this request fails from vRA perspective on it's console , it does go ahead and performs re-configure operation on VM. You would experience this issue only if there is a encrypted custom property in the blueprint This bug is being fixed in upcoming 7.4 patch release.

  • Unable to stop Tomcat instance that was started as a foreground process

    I was implementing HF 10 today in one of our environments to fix health broker service which was broken. There were multiple bug w.r.t health broker fixed in HF 10 so that prompted me to do so While installing , it failed at a point where it was unable to stop tomcat for horizon-workspace Exception was as below Before we started HF 10 installation every pre-requisite checks were done As above exception was clearly stating that it's unable to stop tomcat responsible for horizon-workspace , went ahead and verified if the status is actually " Running " or in a different state Not sure what was the reason behind status of these two services being "UNKNOWN" But a quick restart of these services service vco-configurator restart service horizon-workspace restart Did bring them back into " RUNNING" state Post that vRA 7.4 HF10 installation was successful. Moral of the Story Do not just go by status of VAMI service registrations , quickly do cross check if the underlying application services are in "RUNNING" state

  • Endpoint with id [XXXXX-XXXXX-XXXXX-XXXXX] is not found in SQL Server on IAAS endpoint

    Recently, I was looking at a problem where user was unable to save an NSX endpoint When we edit these endpoints and click on "Test Connection" , it does succeed. But as soon as we click on save, we get below exception under /var/log/vmware/vcac/catalina.out [UTC:2019-02-26 04:12:41,054 Local:2019-02-26 15:12:41,054] vcac: [component="cafe:iaas-proxy" priority="ERROR" thread="tomcat-http--3" tenant="nukescloud" context="Ge5uipgR" parent="Ge5uipgR" token="iVIMcVWX"] com.vmware.vcac.iaas.controller.endpointconfiguration.EndpointController.update:121 - Endpoint update failed: Endpoint with id [XXXXX-XXXXX-XXXXX-XXXXX] is not found in SQL Server on IAAS endpoint. We definitely know that there is a problem w.r.t this endpoint on SQL database , but where was the question. I did create an NSX endpoint in my lab , it creates updates both vPostgres DB for vRA and SQL db for IaaS As first step, let's look into vRA's Postgres database Login into postgres database su - postgres /opt/vmware/vpostgres/current/bin/psql vcac Enable expanded display vcac=# \x Expanded display is on. Then , review the contents of this table vcac=# select * from public.epconf_endpoint; This is how the output would look. You have two endpoints visible One for the vCenter and the other for NSX. id displayed in the above table is the NSXEndpointId what SQL refers in it's IaaS database type_id is the type of endpoint you create name and description are the descriptors you give while creating an endpoint extension_data is the data it fetches from the endpoint w.r.t certificate thumbprint , username & password created_date and last_updated are self explanatory Now let's compare this with SQL's table which has this configuration The table we are interested would be [dbname].[DynamicOps.VCNSModel].[VCNSEndpoints] as show in the below screenshot As i stated earlier Id in vRA's Postgres database should be same as NSXEndpointId in IaaS database In the environment where the failure was observed, NSXEndpointId was set to NULL Now that we established a relationship by trying to understand how this works from DB perspective , it was easy to fix it on the problematic environment All we had to do is to execute an update statement to replace the NULL value with it's appropriate ID captured from postgres database Example :- update [vradistrib].[DynamicOps.VCNSModel].[VCNSEndpoints] set NSXEndpointId = '79fc5423-089b-4b4a-8a7a-b416f68e70bb' where Id = '3'; !! Hope this helps !!

  • Selecting a Network Profile unavailable while creating a blueprint

    A network profile essentially provides your VM with information such as IP / Netmask / Gateway etc., and vRA then also keeps a record of IPs used from the pool. I was working on one of the environment's when user was creating a blueprint , the pane where he/she has to select a network profile was blank It definitely cannot be a bug as it was working in my lab perfectly. From logs ( /var/log/vmware/vcac/catalina.out ) , [ Enabled debug logging ] [UTC:2019-02-17 23:40:01,192 Local:2019-02-18 12:40:01,192] vcac: [component="cafe:iaas-proxy" priority="DEBUG" thread="tomcat-http--46" tenant="" context="WgxltG49" parent="" token="WgxltG49"] com.vmware.vcac.platform.trace.TraceRequestUtil.startTraceRequest:33 - Trace started [UTC:2019-02-17 23:40:01,308 Local:2019-02-18 12:40:01,308] vcac: [component="cafe:iaas-proxy" priority="DEBUG" thread="tomcat-http--46" tenant="vsphere.local" context="WgxltG49" parent="" token="WgxltG49"] com.vmware.vcac.platform.service.rest.config.RestRequestMappingHandlerMapping.getHandlerInternal:317 - Returning handler method [public org.springframework.data.domain.Page com.vmware.vcac.iaas.controller.fabric.NetworkProfilesController.listForTenant(com.vmware.vcac.platform.service.rest.PageAndSortRequest)] [UTC:2019-02-17 23:40:01,309 Local:2019-02-18 12:40:01,309] vcac: [component="cafe:iaas-proxy" priority="DEBUG" thread="tomcat-http--46" tenant="vsphere.local" context="WgxltG49" parent="" token="WgxltG49"] com.vmware.vcac.platform.service.rest.init.RestWebApplicationInitializer$RestServlet.doDispatch:955 - Last-Modified value for [/iaas-proxy-provider/api/network/profiles/tenant] is: -1 [UTC:2019-02-17 23:40:01,309 Local:2019-02-18 12:40:01,309] vcac: [component="cafe:iaas-proxy" priority="INFO" thread="tomcat-http--46" tenant="vsphere.local" context="WgxltG49" parent="" token="WgxltG49"] com.vmware.vcac.iaas.controller.fabric.NetworkProfilesController.listForTenant:197 - Looking up network profiles * * * * [UTC:2019-02-17 23:40:01,469 Local:2019-02-18 12:40:01,469] vcac: [component="cafe:iaas-proxy" priority="INFO" thread="tomcat-http--46" tenant="vsphere.local" context="WgxltG49" parent="" token="WgxltG49"] com.vmware.vcac.iaas.controller.fabric.NetworkProfilesController.listForTenant:203 - Finished looking up network profiles [UTC:2019-02-17 23:40:01,473 Local:2019-02-18 12:40:01,473] vcac: [component="cafe:iaas-proxy" priority="DEBUG" thread="tomcat-http--46" tenant="vsphere.local" context="WgxltG49" parent="" token="WgxltG49"] com.vmware.vcac.platform.service.rest.init.RestWebApplicationInitializer$RestServlet.processRequest:1000 - Successfully completed request [UTC:2019-02-17 23:40:01,474 Local:2019-02-18 12:40:01,474] vcac: [component="cafe:iaas-proxy" priority="DEBUG" thread="tomcat-http--46" tenant="" context="WgxltG49" parent="" token="WgxltG49"] com.vmware.vcac.platform.trace.TraceRequestUtil.stopTraceRequest:84 - Trace stopped Absolutely no errors / exceptions from logs Then realized that we were missing something very simple While creating reservation in vRA , you do mention / select the mapping between the Network Adapter ( vSphere Network ) and Network Profile If by mistake you leave this mapping blank as shown in the below screenshot then you would end up in this situation where network profile is blank or does not show up. The moment you make the change and select the network profile It populates the profile on the pane !! Hope this helps !!

  • Implementing vRA 7.4 Patch HF8 on top of HF3 ( deep-dive )

    There was a requirement for us to work on implementing HF8 on a environment which was running on HF3. I created a run-book to explain the procedure. Thought of sharing it with everyone as it would be useful. Pre-Requisites For successful patch deployments, perform these prerequisite steps on the target vRealize Automation cluster: Ensure the Service Account running the 'VMware vCloud Automation Center Management Agent' has the following requirements: Account must be part of the Local Administrator group. Account must have 'Log on as a service' enabled in the Local Security Policies. Account must be formatted as domain\username. Remove old / obsolete nodes from the Distributed Deployment Information Table. For detailed steps, see Remove a Node from the Distributed Deployment Information Table section of vRealize Automation documentation. Ensure that Management Agent in all IaaS machines is latest (7.4) version. In vRA Virtual Appliance nodes, open the /etc/hosts file and locate the entry IPv4 loopback IP Address (127.0.0.1). Ensure that the Fully Qualified Domain Name of the node is added immediately after 127.0.0.1 and before localhost. For example, 127.0.0.1 FQDN_HOSTNAME_OF_NODE localhost Take snapshots/ backups of all nodes in your vRealize Automation installation. If your environment uses load balancers for HA, disable traffic to secondary nodes and disable service monitoring until after installing or removing patches and all services are showing REGISTERED. Obtain the files from below and copy it to the file system available to the browser you use for the vRealize Automation appliance management interface. Files needed to install HF8 Following files are needed to install HF8 on a vRealize Automation 7.4 environment vRA-7.4-HF8-patch self-patch.zip patchscript.sh All three files are available under KB: 56618 Implementing vRA 7.4 HF8 As a first step , ensure all the pre-requisites are met. These are mandatory and cannot be skipped. Now , copy self-patch.zip and patchscript.sh under /tmp of the Master or Primary vRealize Automation appliance Once they are copied , give required permissions to the new file: chmod +x patchscript.sh Then run the patchscript.sh After script execution completes , it would throw a message stating Self-Patch successfully applied Note : **Ensure the prerequisite script has run prior to running the below procedure to implement the actual patch!** Log in to the vRealize Automation appliance management interface (https://vrealize-automation-appliance-FQDN:5480) as root. This has to be your Primary or Master node if it's a distributed vRA instance Click vRA Settings > Patches Under Patch Management, click the option that you need, and follow the prompts to install a new patch Once you click on INSTALL , it would start implementing the patch First creates a local patch repo 2019-01-18T05:08:04.157975+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.commands.cluster.patch.util.PatchUtil.getAllEligiblePatchesAndCreatePatchRepo:48 - Creating local patch repo Identifies it has to install HF8 2019-01-18T05:11:31.584608+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.commands.cluster.patch.PatchDeployCommand.installPatch:129 - Installing the patch 7342927e-7099-4d8a-bc6b-8ca77c5a876b It starts applying HF8 after extraction. Also identifies that it has a previous patch installed , which we all know it's HF3 with patch ID : 58ec2da5-823b-440e-b918-fbdf6ff7166f 2019-01-18T05:11:33.705636+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.commands.cluster.patch.util.PatchUtil.getPatchLocation:112 - Patch location: /usr/lib/vcac/patches/repo/contents/vRA-patch/45994cb81454cba76ebe347e9e149e3a2253d74f889b5b667d117e438cbac4/patch-vRA-7.4.10980652.10980652-HF8 2019-01-18T05:11:33.884824+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.services.cluster.patch.PatchHistoryRepository.getLastAppliedPatch:249 - The last applied patch 58ec2da5-823b-440e-b918-fbdf6ff7166f 2019-01-18T05:11:33.885757+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.commands.cluster.patch.PatchDeployCommand.applyPatch:258 - Applying the patch 58ec2da5-823b-440e-b918-fbdf6ff7166f-Reverse This is when you would see that HF3 is being uninstalled Initiates HF3 Reverse patching 2019-01-18T05:15:05.134068+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.commands.cluster.patch.util.PatchUtil.publishInstallBundlesForDownloading:190 - Created cafe.patch in: 2019-01-18T05:15:05.633471+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.commands.cluster.patch.util.PatchUtil.publishInstallBundlesForDownloading:195 - Created iaas.patch in: 2019-01-18T05:15:05.634676+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.commands.cluster.patch.PatchExecutor.run:147 - 1. Initiate patching... 2019-01-18T05:15:05.634676+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.commands.cluster.patch.PatchExecutor.initiatePatching:226 - Starting :: initiate patching 2019-01-18T05:15:05.738027+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.services.cluster.patch.PatchHistoryRepository.addPatch:75 - Adding patch 58ec2da5-823b-440e-b918-fbdf6ff7166f-Reverse to history:: Identifies nodes where HF3 reverse has to be applied 2019-01-18T05:15:05.975140+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.commands.cluster.patch.util.PatchUtil.getPatchLocation:112 - Patch location: /usr/lib/vcac/patches/repo/contents/vRA-patch/9a909a94eb9cb15199c686e2e29d8fc83ea5fe3460426e340476544b211dc/patch-vRA-7.4.8182598.8182598-HF3-Reverse 2019-01-18T05:15:06.133201+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.services.cluster.patch.PatchHistoryRepository.syncPatchHistory:287 - Queueing command update-patch-history 2019-01-18T05:15:06.190063+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.services.cluster.impl.ClusterNodeServiceImpl.run:332 - Notifying node with hostname [nukesvra01.nukescloud.com] for command process-cmd... 2019-01-18T05:15:06.190344+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.services.cluster.impl.ClusterNodeServiceImpl.run:332 - Notifying node with hostname [nukesvra02.nukescloud.com] for command process-cmd... 2019-01-18T05:15:06.192237+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.services.cluster.impl.ClusterNodeServiceImpl.run:332 - Notifying node with hostname [nukesvra03.nukescloud.com] for command process-cmd... Starts a thread to stop services on all of the nodes 2019-01-18T05:15:06.314525+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.platform.rest.client.impl.HttpClientFactory$IdleConnectionEvictor.start:370 - Starting thread Thread[Connection evictor-57136495-4bad-4414-a279-000ae3c34a54,5,main] 2019-01-18T05:15:06.637921+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.services.cluster.impl.ClusterNodeCommunicatorImpl.notifyNode:121 - Notifying an existing cluster node with url: [https://nukesvra01.nukescloud.com:5480/config/process-cmd] for configuration changes. 2019-01-18T05:15:06.646167+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.services.cluster.impl.ClusterNodeCommunicatorImpl.notifyNode:121 - Notifying an existing cluster node with url: [https://nukesvra03.nukescloud.com:5480/config/process-cmd] for configuration changes. 2019-01-18T05:15:06.649133+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.services.cluster.impl.ClusterNodeCommunicatorImpl.notifyNode:121 - Notifying an existing cluster node with url: [https://nukesvra02.nukescloud.com:5480/config/process-cmd] for configuration changes. Finishes patch iniitiation 2019-01-18T05:15:23.913716+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.services.cluster.patch.PatchHistoryRepository.currentPatch:115 - The current patch is 58ec2da5-823b-440e-b918-fbdf6ff7166f-Reverse 2019-01-18T05:15:23.913716+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.commands.cluster.patch.PatchExecutor.initiatePatching:242 - Finished :: Initiate patching Starts patch discovery 2019-01-18T05:15:23.913738+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.commands.cluster.patch.PatchExecutor.run:153 - 2. Patch discovery... 2019-01-18T05:15:23.913738+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.commands.cluster.patch.PatchExecutor.discovery:247 - Starting :: component discovery 2019-01-18T05:15:23.924128+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.commands.cluster.patch.PatchExecutor.isAllCommandExecuted:775 - Checking if all commands are executed 2019-01-18T05:15:38.925206+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.commands.cluster.patch.PatchExecutor.validateCommandStatusForFinishLine:788 - Starting:: Command validation 2019-01-18T05:15:38.935453+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.commands.cluster.patch.PatchExecutor.validateCommandStatusForFinishLine:793 - Command status for update-patch-history: COMPLETED Finishes HF3 reverse patch installation 2019-01-18T05:58:04.331648+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.commands.cluster.patch.PatchExecutor.finalizePatch:739 - Starting :: Finlaize patch 2019-01-18T05:58:04.331758+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.services.cluster.patch.PatchHistoryRepository.currentPatch:115 - The current patch is 58ec2da5-823b-440e-b918-fbdf6ff7166f-Reverse 2019-01-18T05:58:04.336476+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.services.cluster.patch.PatchHistoryRepository.finishPatch:122 - Finishing patch :: 58ec2da5-823b-440e-b918-fbdf6ff7166f-Reverse As it finished uninstalling HF3 , now it starts installing HF8 2019-01-18T05:58:34.296019+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.commands.cluster.patch.PatchDeployCommand.applyPatch:258 - Applying the patch 7342927e-7099-4d8a-bc6b-8ca77c5a876b 2019-01-18T06:01:49.098522+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.commands.cluster.patch.PatchExecutor.run:147 - 1. Initiate patching... 2019-01-18T06:01:49.098920+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.commands.cluster.patch.PatchExecutor.initiatePatching:226 - Starting :: initiate patching 2019-01-18T06:01:49.175530+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.services.cluster.patch.PatchHistoryRepository.addPatch:75 - Adding patch 7342927e-7099-4d8a-bc6b-8ca77c5a876b to history:: 2019-01-18T06:01:49.829562+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.commands.cluster.patch.util.PatchUtil.getPatchLocation:112 - Patch location: /usr/lib/vcac/patches/repo/contents/vRA-patch/45994cb81454cba76ebe347e9e149e3a2253d74f889b5b667d117e438cbac4/patch-vRA-7.4.10980652.10980652-HF8 2019-01-18T06:02:07.223205+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.services.cluster.patch.PatchHistoryRepository.currentPatch:115 - The current patch is 7342927e-7099-4d8a-bc6b-8ca77c5a876b Finalizes HF8 as it's installation is complete 2019-01-18T07:19:58.675075+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.commands.cluster.patch.PatchExecutor.finalizePatch:739 - Starting :: Finlaize patch 2019-01-18T07:19:58.675240+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.services.cluster.patch.PatchHistoryRepository.currentPatch:115 - The current patch is 7342927e-7099-4d8a-bc6b-8ca77c5a876b 2019-01-18T07:19:58.689236+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.services.cluster.patch.PatchHistoryRepository.finishPatch:122 - Finishing patch :: 7342927e-7099-4d8a-bc6b-8ca77c5a876b 2019-01-18T07:19:58.689439+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.cli.configurator.services.cluster.patch.PatchHistoryRepository.finishPatch:173 - Set last applied patch to 7342927e-7099-4d8a-bc6b-8ca77c5a876b Since application of patch is now finished , it starts the services on all of the nodes 2019-01-18T07:20:05.468675+00:00 nukesvra01.nukescloud.com vcac-config: INFO com.vmware.vcac.platform.rest.client.impl.HttpClientFactory$IdleConnectionEvictor.start:370 - Starting thread Thread[Connection evictor-8a21cda8-1edd-4b4e-85d5-82587ad602f6,5,main] As a final step , enable secondary nodes on the load balancer and ensure all health-checks on the environment are passed.

  • Docker Containers vs Virtual Machines

    Note : Docker Containers are not Virtual Machines What is a Virtual Machine ? Let's explain each layer from the bottom Everything starts with INFRASTRUCTURE. This could be a laptop , a dedicated server running in a datacenter or a server being used on AWS or Google Cloud On top of it runs an operating System ( Windows or Distribution of Linux or a Mac OS ) , if it is a VM then it would be commonly labelled as a Host Operating System Then comes your Hypervisor. There are two types of hypervisors ​Type 1 Hypervisors which runs directly on the system hardware Type 2 Hypervisor run on a host operating system that provides virtualization services Post Hypervisor , comes our Guest OS. For example , if we have to spin up three applications then we need to have Three Guest OS virtual machines controlled by our Hypervisor Each Guest OS has Memory / Storage & CPU OverHead to it for it to run On top of these we will need to have binaries on each Guest OS to support the application Finally , you would have your application installed. If one want's these applications to be isolated , then these have to installed on separate virtual machines What is a Docker Container ? Looking at the above image you would see a striking difference. Yes, There is no need to run a massive guest operating system. Let's break it down again from bottom to top approach Docker containers do need INFRASTRUCTURE to run them. This could be laptop , a virtual machine running on a datacenter or a server running on AWS or Google Cloud Then comes HOST OPERATING SYSTEM. This can be anything capable of running Docker. All Major distributions of Linux run Dockers. There are ways to install Dockers on Windows and MAC OS as well In the next phase, as you can see DOCKER DAEMON replaces HYPERVISOR. Docker Daemon is a service that runs in the background on your host operating system and manages everything required to run and interact with Docker containers Next up we have our binaries and libraries, just like we do on virtual machines. Instead of then being ran on a guest operating system, they get built into a special packages called Docker Images. Then the Docker Daemon runs those images The last block in this building is our applications.Each applications ends up running in it's own Docker Image and will be managed independently by Docker Daemon.Typically each application and it's library dependencies get packed into the same Docker Image. As shows in the image , applications are still isolated Real World Differences between both Technologies There's a lot less moving parts with Docker. No need of a Hypervisor or a Virtual Machine Docker daemon communicates directly with the host operating system and knows how to distribute resources for running docker containers. It's also an expert at ensuring each container is isolated from both the host OS and other containers If you want to start an application running on a virtual machine , you would have to wait till the operating system boots up. This eventually takes a minute or two. But in Docker container it just takes milliseconds You would save on Storage, Memory and CPU as there is no need to run a lousy and bulky Guest OS for each application you run There's also no virtualization needed with Docker since it runs directly on the host OS Both Technologies are good at what they do the best Virtual Machines are very good at isolating system resources and entire working environments Docker's philosophy is to isolate individual applications , not entire systems !!! Stay Tuned for more on Dockers !!!

  • RabbitMQ in vRealize Automation

    To understand what's the role of RabbitmQ in vRealize Automation , first of all let's figure out what's RabbitMQ RabbitMQ It's a messaging broker and gives applications a common place to send and receive messages, a safe place to live until they are delivered A centralized messaging enables software applications to connect as a components of larger application Softwares can be written on what state the other applications are in, this enables workload to be distributed to multiple systems for performance and reliability concerns RabbitMQ Architecture The basic architecture of a message queue is simple, there are client applications called producers that create messages and deliver them to the broker (the message queue). Other applications, called consumers, connects to the queue and subscribes to the messages to be processed. A software can be a producer, or consumer, or both a consumer and a producer of messages. Messages placed onto the queue are stored until the consumer retrieves them RabbitMQ's usage in vRealize Automation Used to keep clustered appliances in sync Make sure only one appliance takes action on a given message which prevents race condition Powers the event-broker-service ( EBS ) All of these are done through series of queues , one for each action which has to be kept in sync between each appliance Examples of few Queues are as below ebs.com.vmware.csp.iaas.blueprint.service.machine.lifecycle.active__ ebs.com.vmware.csp.iaas.blueprint.service.machine.lifecycle.provision__ vmware.vcac.core.software-service.taskRequestSubmitted vmware.vcac.core.iaas-proxy-provider.catalogRequestSubmitted vmware.vcac.core.catalog-service.requestSubmitted vmware.vcac.core.event-broker-service.publishReplyEvent RabbitMQ and vRA Clustering Pre-Requisites Host short-names and FQDN's must be resolved among all the appliances which is being clustered This DNS requirement is mandatory because RabbitMQ uses short-name in the node naming convention Ports 4369 , 5672 and 25672 must be open between appliances 4369 is used for peer discovery service by rabbitmq 5672 is used by AMQP 25672 us used for inter-node and cli communication ( erlang distribution server port ) When RabbitMQ is configured in cluster , unlike other clustering applications there is no Master-Slave relationship. The last node to be receiving message is considered to be the "Leading Cluster Node" The only time this becomes an issue is when all nodes in vRA cluster has to be stopped , shutdown all nodes apart from one node , which needs to be restarted which ensures it has all the latest messages from the queue. Then bring back the other nodes which are stopped Listing Message Queues Message Queues are used to ensure multiple clustered vRealize Automation appliances are kept in sync and also to power EBS. From ssh session running rabbitmqctl list_queues will show all currently configured queues. Two pieces of data is returned by default queue name and number of messages in the queue As one can see in the above screenshot The one which starts with ebs.com.vmware.xxxxx.xxxxx is used by event-broker-service The one which starts with vmware.vcac.core.xxxxxx..xxxxx is used for other vRealize Automation functions Configuration Files RabbitMQ uses two main configuration files to set required variables , both files are stored under /etc/rabbitmq /etc/rabbitmq/rabbitmq.config SSL Information TCP Listening Ports Connection Timeouts Heartbeat Interval /etc/rabbitmq/rabbitmq-env.conf NODENAME= Note : If we change USE_LONGNAME to true , then it would use FQDN to name the cluster RabbitMQ Server Service Rabbitmq is controlled by service rabbitmq-server <> Log Locations and it's usage All Rabbitmq logs are stored under /var/log/rabbitmq/* The main operational log is /var/log/rabbitmq/rabbit@.log Above log would contains messages about Startup Shutdown Plugin Information Queue Sync RabbitMQ is only broker , it does not have information on what other systems are doing with the messages. It will only show content about messages received or processed CommandLine Options for Troubleshooting From ssh session of vRA appliance , rabbitmqctl command-let can be used to control rabbitmq system. Some of the options commonly used in troubleshooting are rabbitmqctl cluster_status command would give definitive RabbitMQ clustering status The running nodes line in the above command should contain all the nodes which are part of the cluster rabbitmqctl list_policies command would list all currently enforced policies. In vRA only one policy should be returned that's ha-all Re-Join a node to RabbitMQ cluster If a node is returning less nodes than expected , we can re-join a node to RabbitMQ cluster through VAMI The node which is being joined to the cluster would be reset, which removes all messages and metadata on that node. Since ha-all policy is set as discussed above , all messages and metadata are copied on other nodes , which means even if the node is reset , once it's back into the cluster , metadata and messages are copied back to the node. Reset Rabbitmq Usage As a last resort we can reset rabbitmq to a default state by clicking on Reset Rabbitmq Cluster on VAMI. This would Clear all messages out of queues Should only be used as last resort All historical data would be destroyed !! Happy Learning !!

  • Reverse Proxy in vRealize Automation

    Understanding Reverse Proxy ( haproxy ) On vRealize Automation Appliance we have multiple services running simultaneously. An Open-Source application called HA Proxy is used to provide a reverse proxy on port 443 , routing traffic to each service appropriately. Each incoming request is analyzed on where it should be routed. Routing is determined by looking at the configuration files that contains a set of rules which live under /etc/haproxy/conf.d/ These rules outline What url should be directed where Any checks that needs to be done before routing traffic What to do by default In order to understand how reverse proxy functions in vRealize Automation , let's look at the configuration files When haproxy starts each *cfg file is loaded in alphanumeric order. The default and recommended naming convention is to start each file with 2 digit number. These file define what to do with each requests coming in. Let's look at /etc/haproxy/conf.d/20-vcac.cfg , we can learn how routing is determined The snippet shown above defines requests for orchestrator and health-checks , then states what to do with these requests. Lines 25 to 29 starting with acl says what URI to look for , for example /vco or /vcac/services/api/health If /vco is found then it would use backend-vro ( Line#33 ) If /vcac/services/api/health is found then backend-vra-health is used ( Line#34 ) If nothing is found or none of the conditions are met , then default backend is used that's backend-vra If we go down the configuration file , we can find the respective backend code as well. Line#66 in the above screenshot shows us backend-vra-health , Line#80 shows us backend-vro In the above example , if backend-vro is used then the traffic will be routed to port 8280 on the localhost Let's now see how Reverse Proxy is used in vRealize Automation with an example When a user navigates to https:///vcac/ a check is performed for a valid SAML token. In this case , if the user did not login into vRA for a while and does not have a valid token.Reverse Proxy , then forwards the connection to vIDM for authentication. When authentication is complete , a valid SAML token is cached into browser. Browser sends the login request again to Reverse Proxy , now since a valid SAML token is present it forwards connection to vRA CAFE If a valid SAML token was found in the first instance , then Reverse Proxy would have directly forwarded the connection to vRA CAFE Some requests like vRO API calls are done through URI. If request comes in like https:///vco , then Reverse Proxy forwards it to vRO directly Reverse Proxy Logging By default Reverse Proxy ( haproxy ) logging is disabled. In my experience, never came across any issues with this component of vRealize Automation. If you still want to enable logging for haproxy then we can accomplish it at /etc/haproxy/haproxy.cfg Note: Backup /etc/haproxy/haproxy.cfg before making any configuration changes to it To enable logging we need to add following lines to haproxy.cfg and then restart services stats enable stats uri /stats stats realm Strictly\ Private stats auth admin:VMware123! The first line enables stats view Second line set's access URI to stats , that means that there is a new view set at https:///stats Third line set's access to private Fourth line set's the username and password ( you may set password as you want ) Post these additions to the file , you would have to restart the haproxy service service haproxy restart Captured these steps in screenshot for reference To confirm if Reverse proxy is functional We can check it's service status that's service haproxy status Find Listening Ports and Running Process netstat -plnt | grep haproxy ps -ef | grep haproxy haproxy stats page

  • Unable to connect to endpoint. Timeout occurred while processing the request

    Test Connection Request while adding an endpoint fails with an exception Unable to connect to endpoint. Timeout occurred while processing the request Even though we verified firewall , ports , vCenter Services status still we were still encountering this exception. When we were looking into the logs For a working environment Manager Service \ All.log As seen below, when a Test Connection is initiated from vRA UI , Manager Service is responsible for initiating this request and it creates a workitem to be processed by underlying vSphereAgent [UTC:2018-11-05 09:25:45 Local:2018-11-05 09:25:45] [Debug]: [sub-thread-Id="67" context="gjbLTAc0" token="gx3pVepj"] TestConnection Request: Name [vc.nukescloud.com], Address: [https://vc.nukescloud.com/sdk], Username: [nukescloud\nukes] [UTC:2018-11-05 09:25:48 Local:2018-11-05 09:25:48] [Debug]: [sub-thread-Id="25" context="" token=""] TestConnection WorkItemResponse: [Test connection failed: Certificate is not trusted (RemoteCertificateChainErrors). Subject: C=US, CN=vc.nukescloud.com Thumbprint: 9C949FAEB87CBF97419CF4BFE70EB3FB8A035173INVALID_CERTIFICATE] vSphereAgent.log Below snippet is from vSphereAgent.log as you can see , once the workitem is created by Manager Service , vSphereAgent will process the workitem and fetch the certificate from endpoint. 2018-11-05T09:25:48.218Z IAAS75 vcac: [component="iaas:VRMAgent.exe" priority="Info" thread="4332"] [sub-thread-Id="5" context="" token=""] Starting : Processing Workitem ID [9bca4157-1fd3-4e19-a7dd-117aacc5e5d6] [testconnection] 2018-11-05T09:25:48.218Z IAAS75 vcac: [component="iaas:VRMAgent.exe" priority="Debug" thread="4332"] [sub-thread-Id="5" context="" token=""] [[]] [testconnection] TestConnection.Endpoint.Address=https://vc.nukescloud.com/sdk 2018-11-05T09:25:48.218Z IAAS75 vcac: [component="iaas:VRMAgent.exe" priority="Debug" thread="4332"] [sub-thread-Id="5" context="" token=""] [[]] [testconnection] TestConnection.Endpoint.Username=nukescloud\nukes 2018-11-05T09:25:48.218Z IAAS75 vcac: [component="iaas:VRMAgent.exe" priority="Debug" thread="4332"] [sub-thread-Id="5" context="" token=""] [[]] [testconnection] TestConnection.Endpoint.TrustThumbprint= 2018-11-05T09:25:48.218Z IAAS75 vcac: [component="iaas:VRMAgent.exe" priority="Debug" thread="4332"] [sub-thread-Id="5" context="" token=""] [[]] [testconnection] TestConnection.Endpoint.TrustAllCertificates=False 2018-11-05T09:25:48.218Z IAAS75 vcac: [component="iaas:VRMAgent.exe" priority="Debug" thread="4332"] [sub-thread-Id="5" context="" token=""] Begin test connection request.... 2018-11-05T09:25:48.234Z IAAS75 vcac: [component="iaas:VRMAgent.exe" priority="Warning" thread="5364"] [sub-thread-Id="10" context="" token=""] Invalid certificate found: C=US, CN=vc.nukescloud.com, Untrusted certificate chain 2018-11-05T09:25:48.234Z IAAS75 vcac: [component="iaas:VRMAgent.exe" priority="Warning" thread="5364"] [sub-thread-Id="10" context="" token=""] Invalid certificate found: C=US, CN=vc.nukescloud.com, Untrusted certificate chain 2018-11-05T09:25:48.249Z IAAS75 vcac: [component="iaas:VRMAgent.exe" priority="Error" thread="4332"] [sub-thread-Id="5" context="" token=""] false System.AggregateException: One or more errors occurred. ---> System.Net.WebException: The underlying connection was closed: An unexpected error occurred on a send. ---> DynamicOps.Common.Client.UntrustedCertificateException: Certificate is not trusted (RemoteCertificateChainErrors). Subject: C=US, CN=vc.nukescloud.com Thumbprint: 9C949FAEB87CBF97419CF4BFE70EB3FB8A035173 Above "Certificate is not trusted" message is not an exception but a message which states that certificate of the vCenter is not present in vRA yet. For non-working environment Manager Service \ All.log For the environment where we see exception , We so see test connection request being processed but there is no workitem generated for agent to go ahead and fetch certificate [UTC:2018-10-26 06:56:46 Local:2018-10-26 17:56:46] [Trace]: [sub-thread-Id="15" context="fW8t555E" token="3wP5GoMr"] TestConnection Request: Begin creating request record for request id [d2b48795-bdd2-4f9a-b585-2f8be4e33ac9] [UTC:2018-10-26 06:56:46 Local:2018-10-26 17:56:46] [Trace]: [sub-thread-Id="15" context="fW8t555E" token="3wP5GoMr"] TestConnection Request: Finish creating request record for request id [d2b48795-bdd2-4f9a-b585-2f8be4e33ac9] [UTC:2018-10-26 06:56:46 Local:2018-10-26 17:56:46] [Debug]: [sub-thread-Id="15" context="fW8t555E" token="3wP5GoMr"] TestConnection Request: Name [vcenter.nukescloud.com], Address: [https://vcenter.nukescloud.com/sdk], Username: [nukescloud\nukes] Apart from above three lines , there was no other information in the logs to tell us why and where it's stuck. After little bit of research found that dbo.Agent table found that there were multiple entries under AgentName for the same endpoint on the same Agent Machine , Refer the screenshot below. To come out of this problem , we followed these steps. Firstly , Backup IAAS database Then RDP to IAAS Agent nodes and then uninstall proxy agent for the endpoint which cannot be added Once , agent's are uninstalled , browse to the previously installed location and remove any existing traces ( there might be fe log files left ) Now , get back to SQL Database and then execute following query to remove any traces of the endpoint where it's failing to add from dbo.Agent table In my scenario vCenter was vcenter.nukescloud.com dbo.Agent table had following AgentID's for my endpoint vcenter.nukescloud.com AAAAA , CCCCC and DDDDD Now , we need to go ahead and delete these entries from Database, we may use below SQL query to do this delete from dbo.Agent where AgentID in ('AAAAA','CCCCC','DDDDD'); Post deletion , go ahead and reinstall the agent Once done your expected to see one entry for this endpoint and the agent in dbo.Agent table and it's AgentAlive value would be set to 1 As you can see above if it is not working it's AgentAlive status would be set to 0 Now adding endpoint through vRA UI , it should work as expected Reason why this exception occurs is very simple , when an agent is removed , it does not remove it from IaaS database. When we add an endpoint now , it always looks for the top entry for the agent in the table to assign a workitem. So it hit's the one which is not a proper proxy agent for this endpoint it would not be able to process it further causing a timeout in the UI So when Proxy Agent is uninstalled , ensure you perform a check in dbo.Agent table under IAAS database if it's completely removed and there isn't any stale entry || Happy Learning ||

  • CryptographicException - Keyset does not exist

    Recently stumbled upon a problem where "Last Connected " status of IaaS nodes under VAMI was way off. Ideally "Last Connected" Status of any IaaS nodes should be under 10 ~ 15 minutes The first step to diagnose the problem was to look into the Management Agent/All.log of that node Found following exception , it was the only one inside these All.log's Exception :- [UTC:2018-10-28 23:45:35 Local:2018-10-29 10:45:35] [Error]: [sub-thread-Id="7" context="" token=""] Microsoft.Practices.Unity.ResolutionFailedException: Resolution of the dependency failed, type = "VMware.Cafe.IManagementEndpointClient", name = "(none)". Exception occurred while: Calling constructor VMware.Cafe.ManagementEndpointClient(System.Uri baseAddress, VMware.Cafe.ManagementEndpointSecurityContext authenticationContext, VMware.Cafe.TrustedCertificatePredicate trustCertificatePredicate). Exception is: CryptographicException - Keyset does not exist ----------------------------------------------- At the time of the exception, the container was: Resolving VMware.Cafe.ManagementEndpointClient,(none) (mapped from VMware.Cafe.IManagementEndpointClient, (none)) Calling constructor VMware.Cafe.ManagementEndpointClient(System.Uri baseAddress, VMware.Cafe.ManagementEndpointSecurityContext authenticationContext, VMware.Cafe.TrustedCertificatePredicate trustCertificatePredicate) ---> System.Security.Cryptography.CryptographicException: Keyset does not exist Management agent does not function properly when running under non-administrative account Management agent needs to be started using an administrative account. Even though you have it configured during installation if it is removed later you would end up in this problem.

bottom of page