
Search Results
252 results found with an empty search
- vRA Migration fails with Exception: Invalid data format
I was involved in a vRealize Automaton 7.x migration project, we did encounter failure at a point where we were trying to backup vRA license. Let me share some snippets From migration logs [2020-07-21 02:22:24.523079] [e87799dbb0c348cf8e37f98861c3c135:Pending] Obtain migration package from the source vRealize Automation appliance. [2020-07-21 02:22:24.523216] [3ad82c24b47944ce8e9d44309ab719d3:Pending] Back up vRealize Automation license. [2020-07-21 02:22:24.523273] [3b568e500a5940f19a4758377dd31185:Pending] Stop vRealize Automation services on cluster node (sevenvra.prem.com). [2020-07-21 02:22:24.523333] [2fc1c91db692483c806eef983f64f36b:Pending] Stop vRealize Automation services. [2020-07-21 02:22:24.523395] [5d593b9319624e26a0614ebc08ce4888:Pending] Prepare vRealize Automation appliance for database migration. [2020-07-21 02:22:24.523457] [cf3c61d6eb464388abaf5114658d6c3b:Pending] Migrate vRealize Automation database. [2020-07-21 02:22:24.523511] [6650bbaa755047bfabb70e3abcdcd7c0:Pending] Re-encrypt sensitive vRealize Automation configuration. [2020-07-21 02:22:24.523561] [1fd72029646b4f8f8f0f93b87889eb53:Pending] Upgrade migrated vRealize Automation database. [2020-07-21 02:22:24.523609] [e5ab28549aa64aec9374296a1bcef976:Pending] Reconfigure vRealize Automation database failover service. [2020-07-21 02:22:24.523656] [99542458f82c4feb8da2dc3bce027461:Pending] Reconfigure vRealize Automation messaging service. [2020-07-21 02:22:24.523701] [14a8d96417d845d5a951903a237c68c4:Pending] Reconfigure Containers Management service. [2020-07-21 02:22:24.523746] [8120bdad9ec04974b9a95d010d7cac18:Pending] Reconfigure vRealize Health Broker service. [2020-07-21 02:22:24.523795] [2e4528d38bc849259fe1cf29603b5cae:Pending] Reconfigure default vRealize Automation tenant. [2020-07-21 02:22:24.523844] [ec5681ef323d4c3e9d8b3492a009cb76:Pending] Migrate embedded vRealize Orchestrator [2020-07-21 02:22:24.523892] [539ed8b0e9dd4534a8a6bc3a20de440b:Pending] Start vRealize Automation services. [2020-07-21 02:22:24.523939] [bb8b9eb8abbd490bb00f84eee747e2e1:Pending] Reconfigure cluster node (sevenvra.prem.com). [2020-07-21 02:22:24.523984] [96f1c4ce28e14f1a9ae8011407fc2283:Pending] Restart vRealize Automation services. [2020-07-21 02:22:24.524030] [d5d7431d4e91461a878f14a33b53a7d5:Pending] Migrate infrastructure node (sevenvra.prem.com). [2020-07-21 02:22:24.524075] [acdd2284cabb4a9abfbfd4e2b4106673:Pending] Restart vRealize Automation services. [2020-07-21 02:22:24.524120] [b42778bdc8c1462ead6c8b983a7f3ef5:Pending] Restore vRealize Automation license. [2020-07-21 02:22:24.524165] [347c7a7a30bc44f0b40c834160c7d013:Pending] Finalize migration. [2020-07-21 02:22:24.541154] Sequence initialized [2020-07-21 02:22:24.541232] Sequence state changed to [migration.ready] [2020-07-21 02:22:24.541287] Sequence execution started [2020-07-21 02:22:24.543532] Sequence state changed to [migration.progress] [2020-07-21 02:22:24.543614] [e87799dbb0c348cf8e37f98861c3c135:Running] Obtain migration package from the source vRealize Automation appliance. [2020-07-21 02:22:24.545893] Invoke script /usr/lib/vcac/tools/migration/sequence/migration/scripts/M00-log-environment [2020-07-21 02:22:27.423438] Script invocation completed with code 0 [2020-07-21 02:22:27.423532] Invoke script /usr/lib/vcac/tools/migration/sequence/migration/scripts/M02-get-migration-package [2020-07-21 02:25:24.183529] Script invocation completed with code 0 [2020-07-21 02:25:24.183719] [e87799dbb0c348cf8e37f98861c3c135:Completed] Obtain migration package from the source vRealize Automation appliance. [2020-07-21 02:25:24.528063] [3ad82c24b47944ce8e9d44309ab719d3:Running] Back up vRealize Automation license. [2020-07-21 02:25:24.530730] Back up license serial key(s) [2020-07-21 02:25:32.820798] Traceback (most recent call last): File "/usr/lib/vcac/tools/migration/framework/mcore.py", line 470, in __execute task.execute(self.__context) File "/usr/lib/vcac/tools/migration/sequence/migration/execute", line 137, in execute for li in mutil.invokeConfigurator(['/usr/sbin/vcac-config', 'license-info']): File "/usr/lib/vcac/tools/migration/framework/mutil.py", line 61, in invokeConfigurator result = parseConfiguratorResult(errors, errorMessage) File "/usr/lib/vcac/tools/migration/framework/mutil.py", line 56, in parseConfiguratorResult raise ex Exception: Invalid data format. [2020-07-21 02:25:32.820925] [3ad82c24b47944ce8e9d44309ab719d3:Failed] Back up vRealize Automation license. [2020-07-21 02:25:32.825554] Sequence execution finished [2020-07-21 02:25:32.827844] Sequence state changed to [migration.failed] [2020-07-21 02:25:32.827923] Sequence has a task execution error. Cancel pending tasks [2020-07-21 02:25:32.827984] [3b568e500a5940f19a4758377dd31185:Cancelled] Stop vRealize Automation services on cluster node (sevenvra.prem.com). [2020-07-21 02:25:32.830210] [2fc1c91db692483c806eef983f64f36b:Cancelled] Stop vRealize Automation services. [2020-07-21 02:25:32.832655] [5d593b9319624e26a0614ebc08ce4888:Cancelled] Prepare vRealize Automation appliance for database migration. So one can see above the task of backing up license was a failure. Let's understand what happens at this point in time. When a backup of license script is triggered we execute a command called [master] sevenvra:~ # /usr/sbin/vcac-config license-info ---BEGIN--- [{"licenseInfo":{"@type":"SerialKeyLicenseInfo","serialKeys":["YYYYYY-YYYYYY-YYYYYY-YYYYYYY"],"expiration":null,"restrictions":[{"product":{"name":"VMware vRealize Automation Enterprise","editionKey":"vac.enterprise.serverVm","suiteName":null,"family":{"name":"VMware vCloud Automation Center","version":"7.0"},"id":"VMware vCloud Automation Center7.0vac.enterprise.serverVmserverVm"},"costUnitLimits":[{"enforcementType":"hardEnforced","unit":{"id":"serverVm"},"value":25}],"licenseProductCapabilities":[{"version":"7.5.0.0","features":[{"id":"vac"},{"id":"vdc"}],"keyValues":null}]}],"name":"vRA Standalone License"},"id":"urn:vri:com.vmware.license.license:XXXXXX-XXXXXX-XXXXXXX-XXXXXXX-XXXXXXX","assetId":"urn:vri:com.vmware.license.asset:comvmwarevcacstandalone"}] ---END--- The output of the above command returns serial keys and other parameters required which would be applied to the destination server where we are migrating data to. If we read the exception, there is some sort of corruption happening when the values are being returned by the command. The path or remediation plan implemented was to remove the existing license on the Source vRealize Automation server and then re-implement the license after we reboot the server. Steps to remove license on a vRealize Automation 7.x node have been documented in my previous blog article. Click on this link and read the procedure. Once the license was removed and re-applied give a shot at migration and it should work.
- Enable TLS on Localhost Configuration as part of vRealize Automation Hardening 7.x
I and my peers were assisting a project where vRealize Automation 7.x was supposed to be deployed and hardened. Found out that there are lots of issues/misconfigurations inside the document for certain sections which has to be called out. Click here for the hardening guide version 7.6 I would call out certain sections where issues were seen after implementing it. Not all sections will be discussed here as most of them are straight forward. Problematic sections are "Enable TLS on Localhost Configuration", Page 22 "Verify that SSLv3, TLS 1.0, and TLS 1.1" are Disabled, Page 24 Let's start with the section "Enable TLS on Localhost Configuration" Step 1 Take SSH to vRA appliance Step 2 Set permissions for the vcac keystore by running the following commands usermod -A vco,coredump,pivotal vco chown vcac.pivotal /etc/vcac/vcac.keystore chmod 640 /etc/vcac/vcac.keystore Execute this as shown in the document, there are no changes to this step Step 3 According to documentation, it states to perform following steps Update the HAProxy configuration Open the HAProxy configuration file located at /etc/haproxy/conf.d and choose the 20- vcac.cfg service Locate the lines containing the following string: server local 127.0.0.1… and add the following to the end of such lines: ssl verify none It states that the change has to be performed under the following sections of 20-vcac.cfg file backend backend-vrhb backend-horizon backend-vro backend-vra backend-artifactory backend-vra-health But when you take a look at the file , there is no backend-artifactory section in it. So that's a mistake The only backend's which are available are backend backend-vrhb backend backend-horizon backend backend-vra backend backend-vra-health backend backend-vro backend backend-vco-health Another important change in the documentation which is missing is that backend-vro port has to be changed from 8280 to 8281 NOTE : TAKE A BACKUP OF ORIGINAL FILES BEFORE CHANGES /etc/haproxy/20-vcac.cfg file after changes backend backend-horizon mode http balance leastconn option http-server-close option forwardfor option redispatch http-response replace-value Set-Cookie JSESSIONID=(.*) JSESSIONID_HZN=\1 http-response replace-value Set-Cookie XSRF-TOKEN=(.*) XSRF-TOKEN_HZN=\1 http-request replace-value Cookie (.*?)JSESSIONID_HZN=([^;]+)(.*?) \1JSESSIONID=\2\3 http-request replace-value Cookie (.*?)XSRF-TOKEN_HZN=([^;]+)(.*?) \1XSRF-TOKEN=\2\3 cookie JSESSIONID prefix timeout check 10s server local 127.0.0.1:8443 maxconn 500 ssl verify none backend backend-vra mode http balance leastconn option http-server-close option forwardfor option redispatch http-response replace-value Set-Cookie JSESSIONID=(.*) JSESSIONID_VRA=\1 http-response replace-value Set-Cookie XSRF-TOKEN=(.*) XSRF-TOKEN_VRA=\1 http-request replace-value Cookie (.*?)JSESSIONID_VRA=([^;]+)(.*?) \1JSESSIONID=\2\3 http-request replace-value Cookie (.*?)XSRF-TOKEN_VRA=([^;]+)(.*?) \1XSRF-TOKEN=\2\3 cookie JSESSIONID prefix server local 127.0.0.1:8082 maxconn 1500 cookie A check ssl verify none backend backend-vra-health mode http balance leastconn option http-server-close option log-health-checks option httplog option forwardfor option redispatch http-response replace-value Set-Cookie JSESSIONID=(.*) JSESSIONID_VRA=\1 http-response replace-value Set-Cookie XSRF-TOKEN=(.*) XSRF-TOKEN_VRA=\1 http-request replace-value Cookie (.*?)JSESSIONID_VRA=([^;]+)(.*?) \1JSESSIONID=\2\3 http-request replace-value Cookie (.*?)XSRF-TOKEN_VRA=([^;]+)(.*?) \1XSRF-TOKEN=\2\3 cookie JSESSIONID prefix server local 127.0.0.1:8082 cookie A check ssl verify none backend backend-vro mode http balance leastconn option http-server-close option forwardfor option redispatch http-response replace-value Set-Cookie JSESSIONID=(.*) JSESSIONID_VRO=\1 http-response replace-value Set-Cookie XSRF-TOKEN=(.*) XSRF-TOKEN_VRO=\1 http-request replace-value Cookie (.*?)JSESSIONID_VRO=([^;]+)(.*?) \1JSESSIONID=\2\3 http-request replace-value Cookie (.*?)XSRF-TOKEN_VRO=([^;]+)(.*?) \1XSRF-TOKEN=\2\3 cookie JSESSIONID prefix option httpchk GET /vcac/services/api/health server local 127.0.0.1:8281 cookie A check ssl verify none # server node2 REMOTE-IP:443 cookie A check ssl verify none backend backend-vco-health mode http option http-server-close option forwardfor option redispatch http-response replace-value Set-Cookie JSESSIONID=(.*) JSESSIONID_VRO=\1 http-response replace-value Set-Cookie XSRF-TOKEN=(.*) XSRF-TOKEN_VRO=\1 http-request replace-value Cookie (.*?)JSESSIONID_VRO=([^;]+)(.*?) \1JSESSIONID=\2\3 http-request replace-value Cookie (.*?)XSRF-TOKEN_VRO=([^;]+)(.*?) \1XSRF-TOKEN=\2\3 cookie JSESSIONID prefix server local 127.0.0.1:8280 cookie A check Step 4 Get the password of keystorePass. Locate the property certificate.store.password in the /etc/vcac/security.properties file. Example certificate.store.password=s2enc~00k52MwbaLOWSpiLLl9d2Q\=\= Then it asks to decrypt the value using the command the password from the security.properties file vcac-config prop-util -d --p VALUE The output would be as below [master] sbivra:~ # vcac-config prop-util -d --p s2enc~00k52MwbaLOWSpiLLl9d2Q\=\= password[master] asbvra:~ # So the decrypted password is actually a plain text password Step 5 This step asks you to "Configure the vRealize Automation service" document states Open the /etc/vcac/server.xml file and it asks to add the below attribute to the Connector tag, replacing certificate.store.password with the certificate store password value found in /etc/vcac/security.properties. scheme="https" secure="true" SSLEnabled="true" sslProtocol="TLS" keystoreFile="/etc/vcac/ vcac.keystore" keyAlias="apache" keystorePass="certificate.store.password" But if you follow this as it is you will end up doing as follows scheme="https" secure="true" SSLEnabled="true" sslProtocol="TLS" keystoreFile="/etc/vcac/ vcac.keystore" keyAlias="apache" keystorePass="s2enc~00k52MwbaLOWSpiLLl9d2Q\=\=" But this is wrong. You have to use the decrypted password which is nothing but password The correct attribute is as below Step 6 Even here you ave to use just the decrypted password in the attribute. Not the encrypted one The correct attribute is as below content being updated............
- No valid endpoints found in the Management Agent
You may encounter the following exception during vRA 7.x patching While executing patchscript.sh file, following exception, is seen _main__ - ERROR : 242 - ('Command execution result:\nCommand id: edc117f3-bd3e-4589-8891-a9c889f2262f\n Type: upgrade->management-agent\n Node id: 523C2C66-A308-43AE-8D2A-63FE41C19A9F\n Node host: seveniaas.prem.com\n Result: No >valid endpoints found in the Management Agent\'s configuration.\n Result description: System.InvalidOperationException: >No valid endpoints found in the Management Agent\'s configuration.\r\n at >VMware.IaaS.Management.Commands.Installation.ParameterHelper.GetFirstAvailableEndpointFromContext(IExecutionContext >context)\r\n at VMware.IaaS.Management.Commands.Installation.ParameterHelper.SetContextParameters(IExecutionContext >context, InstallParameters installParameters)\r\n at >VMware.IaaS.Management.Commands.Installation.UpgradeManagementAgentCommand.Execute(IExecutionContext context, IList`1 >parameters)\n Error: {"1":[{"resultDescr":"System.InvalidOperationException: No valid endpoints found in the Management >Agent\'s configuration.\\r\\n at >VMware.IaaS.Management.Commands.Installation.ParameterHelper.GetFirstAvailableEndpointFromContext(IExecutionContext >context)\\r\\n at VMware.IaaS.Management.Commands.Installation.ParameterHelper.SetContextParameters(IExecutionContext >context, InstallParameters installParameters)\\r\\n at >VMware.IaaS.Management.Commands.Installation.UpgradeManagementAgentCommand.Execute(IExecutionContext context, IList`1 >parameters)","resultMsg":"No valid endpoints found in the Management Agent\'s configuration."}]}\n Status: FAILED\n\n', >'Error executing command') This resolution works only if there aren't any patch applied in the environment i.e Environment is on GA Here's the resolution 1. SSH into the virtual appliance master node and replace the "isApplied" value to true by running this command: sed -i 's/false/true/g' /usr/lib/vcac/patches/repo/contents/vRA-patch/self-patch.json 2. Take "vCAC-IaaSManagementAgent-Setup.msi" file from this location of virtual appliance master node: "/usr/lib/vcac/patches/repo/contents/vRA-patch" and put into all the IAAS Nodes. 3. Uninstall previously installed management agent and install new management agent by clicking this new vCAC-IaaSManagementAgent->Setup.msi file. 4. After management agent is installed successfully in all IAAS Nodes, Verify in cluster tab of vRA that version of >management agent has been updated for all the IAAS Nodes. 5. Run precheck. Once precheck is successful start installation of the patch. 6. After patch is installed succesfully, SSH into the virtual appliance master node and run selfpatch again by executing this command: sh /usr/lib/vcac/patches/repo/contents/vRA-patch/patchscript.sh 7. Now you shouldn't see the error "No valid endpoints found in the Management Agent\'s configuration" in the logs. If >above executed command completes successfully, Installation is completed.
- Check vRA Services status via API
Login into vRA appliance Then execute below command curl --insecure -f -s -H "Content-Type: application/json" "https:/$HOSTNAME/component-registry/services/status/current?limit=200" | sed "s/}/\n/g" | grep -E -o ".serviceName.*serviceInitializationStatus.[^,]*" | sed "s/\"serviceTypeId.*,//g" | sed -e "s/\"//g" -e "s/:/=/g" -e "s/,/, /" | sed -e "s/serviceName\|serviceInitializationStatus\|=\|,\|null//g" | column -t | sort | cat -n The output would show the list of services registered on this appliance
- vRealize Automation DataCollection schedules
Data Collection Status information is stored under dbo.DataCollectionStatus table of IaaS Database select * from DataCollectionStatus This table contains AgentID , LastCollectedTime , LastConnectedStatus,EntitiyID,DataCollectionStatusID,FilterSpecID and so on .... FilterSpec refers to the type of endpoint we are collecting data from. dbo.DataCollectionStatus has this FilterSpecID which is coming from dbo.FilterSpec table dbo.FilterSpec table has FilterSpecName,FilterSpecGroupID,AgentCapabilityName in it Let's take the only vSphereEndpoint into consideration and then filter dbo.FilterSpec w.r.t to this endpoint only Since we selected vSphere the AgentCapabilityName will only be vSphereHypervisor select * from dbo.FilterSpec where FilterSpecName = 'vSphere'; Each FilterSpecGroupID belongs to a certain type of data collection task for a specific endpoint. This information is stored under dbo.FilterSpecGroupID table Now let's identify what these FilterSpecGroupID from dbo.FilterSpec table and check what it refers to from dbo.FilterSpecGroup table select * from FilterSpecGroup where FilterSpecGroupID in ( select FilterSpecGroupID from dbo.FilterSpec where FilterSpecName = 'vSphere') As one can see in the above screenshot each FilterSpecGroupID belongs to a FilterSpecGroupName which is eventually a task under DataCollection What's this ScheduleID then inside dbo.FilterSpecGroup table ? ScheduleID comes from dbo.CollectionSchedule where default collection schedules are defined and is associated with an ID. This is the ID that is present under dbo.FilterSpecGroup So here's the flow dbo.CollectionSchedule --> dbo.FilterGroupSpec --> dbo.FilterSpec --> dbo.DataCollectionStatus If one wants to find out the LastCollectedTime and LastCollectedStatus of data collection from the database for a specific cluster, they can use below query select LastCollectedTime,LastCollectedStatus,HostName from DataCollectionStatus dc, host h where dc.EntityID=h.hostID and h.HostName='ClusterName' order by h.HostName Note: Replace ClusterName with your specific Compute Resource Name appropriately.
- How to export embedded Postgres DB from vRLCM 8.x appliance
To export the embedded Postgres database from the VMware vRealize Suite LifeCycle Manager Appliance Log in to the VMware vRealize Automation virtual appliance using SSH. Change directory using this command: cd /tmp Run this command to create a copy of the database in /tmp su -m -c "/opt/vmware/vpostgres/11/bin/pg_dump -Fc vrlcm > /tmp/vrlcm.sql" - postgres Note: -Fc switch already provides a compressed file. No need to bzip. Use SCP or WinSCP to transfer the vcac.sql file off of the appliance.
- Healthchecks for vRealize Automation 7.x
List of health URL's which can be used during vRealize Automation troubleshooting Horizon System Health URL: https://</SAAS/API/1.0/REST/system/health RESULT: {"AnalyticsUrl":"http://localhost:8080","EhCacheClusterPeers":"","AuditPollInterval":"1000","EncryptionServiceVersion":"unknown","AnalyticsConnectionOk":"true","EncryptionServiceVerified":"Master Keystore verified","FederationBrokerStatus":"ok","ServiceReadOnlyMode":"false","AuditWorkerThreadAlive":"true","BuildVersion":"3.1.0.0 Build 12694081","AuditQueueSize":"0","DatabaseStatus":"connection successful","HostName":"sevenvra.prem.com","EncryptionStatus":"connected","FederationBrokerOk":"true","EncryptionConnectionOk":"true","EncryptionServiceImpl":"Encryption Service DB","ClusterId":"add760d8-b9cd-453d-a476-abf323758b59","EhCacheClusterDiagnostics":"","DatabaseConnectionOk":"true","StatusDate":"2020-06-11 14:19:14 UTC","ClockSyncOk":"true","MaintenanceMode":"false","MessagingConnectionOk":"true","fipsModeEnabled":"false","ServiceVersion":"3.1.0","IpAddress":"10.109.46.59","AuditDisabled":"false","AllOk":"true"} As shown above, the "AllOk" tag should be true Horizon Cluster Instances URL: https://<>/API/1.0/REST/system/clusterInstances RESULT: [{"version":"3.1.0.0 Build 12694081","uuid":"5d22e506-0529-3111-b8ca-beb20b620da8","status":"Active","lastUpdated":1591885520764,"hostname":"sevenvra.prem.com","datacenterId":0,"ipaddress":"10.109.46.59"}] ElasticSearch Health URL: ssh to vRA appliance and then execute curl -kv http://localhost:9200/_cluster/health?pretty=true RESULT: { "cluster_name" : "horizon", "status" : "green", "timed_out" : false, "number_of_nodes" : 1, "number_of_data_nodes" : 1, "active_primary_shards" : 15, "active_shards" : 15, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 0, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0 } vRA Service status URL: https://va-fqdn/vcac/services/api/status RESULT: true REGISTERED shell-ui-app cafe-7efxBfzyew 2020-05-28T01:34:51.230Z d823b6f9-688e-4819-acf7-e773060a1e64 true CN=sevenvra.prem.com,OU=PREM,O=PREM,C=SG 2025-04-27T08:16:10Z 2020-04-28T08:16:10Z CN=sevenvra.prem.com,OU=PREM,O=PREM,C=SG 8E:68:E8:F0:EE:BC:12:2B:2D:78:89:C5:F9:37:5E:7C:25:38:C0:32 vRA Component Registry URL: https://vra-fqdn/component-registry/services/status/current?limit=200 RESULT: One would information on all services in the vRA appliance. As shown below notAvailable should always be false and serviceInitializationStatus should be REGISTERED 2020-05-28T01:39:25.430Z https://sevenvra.prem.com/composition-service/api/status true REGISTERED composition-service cafe-7efxBfzyew 2020-05-28T01:39:29.284Z com.vmware.csp.component.cafe.composition.api ca11071c-f0aa-4407-bcc9-980102c2a239 true CN=sevenvra.prem.com,OU=PREM,O=PREM,C=SG 2025-04-27T08:16:10Z 2020-04-28T08:16:10Z CN=sevenvra.prem.com,OU=PREM,O=PREM,C=SG 8E:68:E8:F0:EE:BC:12:2B:2D:78:89:C5:F9:37:5E:7C:25:38:C0:32 Another way to check is from VAMI as shown below Also through ssh by executing the command CURL Command curl --insecure -f -s -H "Content-Type: application/json" "https://$HOSTNAME/component-registry/services/status/current?limit=200" | sed "s/}/\n/g" | grep -E -o ".serviceName.*serviceInitializationStatus.[^,]*" | sed "s/\"serviceTypeId.*,//g" | sed -e "s/\"//g" -e "s/:/=/g" -e "s/,/, /" | sed -e "s/serviceName\|serviceInitializationStatus\|=\|,\|null//g" | column -t | sort | cat -n [master] sevenvra:~ # curl --insecure -f -s -H "Content-Type: application/json" "https://$HOSTNAME/component-registry/services/status/current?limit=200" | sed "s/}/\n/g" | grep -E -o ".serviceName.*serviceInitializationStatus.[^,]*" | sed "s/\"serviceTypeId.*,//g" | sed -e "s/\"//g" -e "s/:/=/g" -e "s/,/, /" | sed -e "s/serviceName\|serviceInitializationStatus\|=\|,\|null//g" | column -t | sort | cat -n 1 advanced-designer-service REGISTERED 2 approval-service REGISTERED 3 authentication REGISTERED 4 authorization REGISTERED 5 branding-service REGISTERED 6 catalog-service REGISTERED 7 component-registry REGISTERED 8 composition-service REGISTERED 9 config-management-service REGISTERED 10 console-proxy-service REGISTERED 11 container-service REGISTERED 12 content-management REGISTERED 13 endpoint-configuration-service REGISTERED 14 event-broker-service REGISTERED 15 eventlog-service REGISTERED 16 fabric-service REGISTERED 17 forms-service REGISTERED 18 healthbroker-proxy-server REGISTERED 19 iaas-proxy-provider REGISTERED 20 iaas-service REGISTERED 21 identity REGISTERED 22 ipam-service REGISTERED 23 licensing-service REGISTERED 24 management-service REGISTERED 25 network-service REGISTERED 26 notification-service REGISTERED 27 o11n-gateway-service REGISTERED 28 placement-service REGISTERED 29 plugin-service REGISTERED 30 portal-service REGISTERED 31 properties-service REGISTERED 32 provisioning-service REGISTERED 33 reservation-service REGISTERED 34 shell-ui-app REGISTERED 35 software-service REGISTERED 36 sts-service REGISTERED 37 vco REGISTERED 38 workitem-service REGISTERED IaaS Web URL: https://iaas-web-fqdn/WAPI/api/status/web RESULT: Repository URL: https://iaas-web-fqdn/Repository/Data/MetaModel.svc RESULT: IaaS Manager URL: https://iaas-manager/VMPSProvision RESULT: IaaS Manager URL: https://iaas-mgr-fqdn/VMPS2 RESULT: DEM Orchestrator Login to Tenant > Infrastructure > Monitoring > DEM Status DEM Worker Login to Tenant > Infrastructure > Monitoring > DEM Status Proxy Agents Login to Tenant > Infrastructure > Compute Resources > Compute Resource > View Proxy Agent vRealize Orchestrator URL: https://vra-va-fqdn:8283/vco-controlcenter Click on validate the configuration RESULT:
- Find Java version being used inside vRO / vRA 8.x
Here's a small article to identify the Java version being used inside vRO or vRA 8.x As a first step one has to identify container id Let's first grep for VCO processes docker ps | grep vco The output of above command would be as follows root@premvra [ ~ ]# docker ps | grep vco cb770fa8f97d 07b9846ebc64 "/bin/bash -c './cre…" 2 weeks ago Up 2 weeks k8s_vco-controlcenter-app_vco-app-54cfbdbdc-2k7wc_prelude_473e2e29-9b70-11ea-8c90-005056a77fe1_0 d572660d055c 07b9846ebc64 "/bin/bash -c './cre…" 2 weeks ago Up 2 weeks k8s_vco-server-app_vco-app-54cfbdbdc-2k7wc_prelude_473e2e29-9b70-11ea-8c90-005056a77fe1_0 d59aebe8bf54 0ced08cb52f4 "dockerd-entrypoint.…" 2 weeks ago Up 2 weeks k8s_vco-polyglot-runner_vco-app-54cfbdbdc-2k7wc_prelude_473e2e29-9b70-11ea-8c90-005056a77fe1_0 95fbc06fa017 vmware/pause:3.1 "/pause" 2 weeks ago Up 2 weeks k8s_POD_vco-app-54cfbdbdc-2k7wc_prelude_473e2e29-9b70-11ea-8c90-005056a77fe1_0 We need to fetch container id from the above output one we want to know the java version Since we are interested in vco-server-app and this has an id: d572660d055c from the above Then use the following command to find out the java version root@premvra [ ~ ]# docker exec -it d572660d055c java -version openjdk version "1.8.0-internal" OpenJDK Runtime Environment (build 1.8.0-internal-_2019_10_29_05_18-b00) OpenJDK 64-Bit Server VM (build 25.71-b00, mixed mode) We can also check from vco-app-server logs under /var/log/container/ which will ve logged during service startup. {"log":"2020-05-21 14:37:49.278+0000 [localhost-startStop-1] INFO {} [O11N] Sysprop: java.runtime.name = Java(TM) SE Runtime Environment\n","stream":"stdout","time":"2020-05-21T14:37:49.278367624Z"} {"log":"2020-05-21 14:37:49.278+0000 [localhost-startStop-1] INFO {} [O11N] Sysprop: java.runtime.version = 1.8.0_221-b11\n","stream":"stdout","time":"2020-05-21T14:37:49.278453768Z"}
- Changing MTU value to causes VMNIC to flap
This morning, I was involved in an escalation where NIC flaps were seen on 6 out of 7 hosts on a brand new vxRail Cluster Looks like it all started once administrator started changing MTU values hosted logs didn't help a great deal as it was reporting that vmnic has gone down. When we did check vmkernel logs, MTU was constantly flipping between 1500 and 9000 grep -i "changing MTU" vmkernel.log 2017-12-18T02:54:18.954Z cpu19:65645)<6>ixgbe 0000:01:00.1: vmnic1: changing MTU from 9000 to 1500 2017-12-18T02:54:19.594Z cpu19:65645)<6>ixgbe 0000:01:00.0: vmnic0: changing MTU from 9000 to 1500 2017-12-18T03:02:28.343Z cpu21:65645)<6>ixgbe 0000:01:00.1: vmnic1: changing MTU from 1500 to 9000 2017-12-18T03:02:28.985Z cpu26:65645)<6>ixgbe 0000:01:00.0: vmnic0: changing MTU from 1500 to 9000 2017-12-18T03:02:59.633Z cpu25:65645)<6>ixgbe 0000:01:00.1: vmnic1: changing MTU from 9000 to 1500 2017-12-18T03:03:00.269Z cpu30:65645)<6>ixgbe 0000:01:00.0: vmnic0: changing MTU from 9000 to 1500 2017-12-18T03:08:48.374Z cpu25:65645)<6>ixgbe 0000:01:00.1: vmnic1: changing MTU from 1500 to 9000 2017-12-18T03:08:49.006Z cpu25:65645)<6>ixgbe 0000:01:00.0: vmnic0: changing MTU from 1500 to 9000 What we see in vmkernel logs is MTU flap. The server was using ixgbe driver. Whenever an MTU change is made, it would cause the driver to bring down the NIC, make necessary changes to hardware, and then bring it back up. The link status will be reported to vmkernel and vobd would take down these changes. vmkernel.log snippet 2017-12-18T03:45:09.454Z cpu23:69310 opID=4758f26a)NetOverlay: 1107: class:vxlan is already instantiated one on depth 0 2017-12-18T03:45:09.470Z cpu20:65645)<6>ixgbe 0000:01:00.1: vmnic1: changing MTU from 1500 to 9000 2017-12-18T03:45:10.101Z cpu23:66107)vxlan: VDL2PortOutputUplinkChangeCB:649: Output Uplink change event with priority :5 was ignored for portID: 400000e. 2017-12-18T03:45:10.101Z cpu23:66107)netschedHClk: NetSchedHClkNotify:2892: vmnic1: link down notification 2017-12-18T03:45:10.101Z cpu22:65646)vdrb: VdrHandleUplinkEvent:1605: SYS:DvsPortset-0: Uplink event 2 for port 0x400000a, linkstate 0 2017-12-18T03:45:10.101Z cpu28:65645)<6>ixgbe 0000:01:00.0: vmnic0: changing MTU from 1500 to 9000 2017-12-18T03:45:10.738Z cpu2:66105)vxlan: VDL2GetlEndpointAndSetUplink:387: Now, no active uplinks in tunnel group:67108878. 2017-12-18T03:45:10.738Z cpu2:66105)netschedHClk: NetSchedHClkNotify:2892: vmnic0: link down notification 2017-12-18T03:45:10.738Z cpu2:66105)netschedHClk: NetSchedHClkDoFlushQueue:3818: vmnic0: dropping 6 packets from queue netsched.pools.persist.default 2017-12-18T03:45:10.738Z cpu2:66105)netschedHClk: NetSchedHClkDoFlushQueue:3818: vmnic0: dropping 3 packets from queue netsched.pools.persist.mgmt 2017-12-18T03:45:10.738Z cpu8:65647)vdrb: VdrHandleUplinkEvent:1605: SYS:DvsPortset-0: Uplink event 2 for port 0x4000008, linkstate 0 2017-12-18T03:45:10.739Z cpu28:69310 opID=4758f26a)VMKAPIMOD: 86: Failed to check if port is Uplink : Failure 2017-12-18T03:45:10.739Z cpu28:69310 opID=4758f26a)Team.etherswitch: TeamESLACPLAGEventCB:6277: Received a LAG DESTROY event version :0, lagId :0, lagLinkStatus :NOT USED,lagName :, uplinkName :, portLinkStatus :NOT USED, portID :0x0 2017-12-18T03:45:10.739Z cpu28:69310 opID=4758f26a)netioc: NetIOCSetRespoolVersion:245: Set netioc version for portset: DvsPortset-0 to 3,old threshold: 3 2017-12-18T03:45:10.739Z cpu28:69310 opID=4758f26a)netioc: NetIOCSetupUplinkReservationThreshold:135: Set threshold for portset: DvsPortset-0 to 75, old threshold: 75 2017-12-18T03:45:10.741Z cpu28:69310 opID=4758f26a)netioc: NetIOCPortsetNetSchedStatusSet:1207: Set sched status for portset: DvsPortset-0 to Active, old:Active 2017-12-18T03:45:10.741Z cpu28:69310 opID=4758f26a)VLANMTUCheck: NMVCDeployClear:871: can't not find psReq for ps DvsPortset-0 2017-12-18T03:45:10.784Z cpu15:69324 opID=1731b730)World: 12230: VC opID de31f2ec maps to vmkernel opID 1731b730 2017-12-18T03:45:10.784Z cpu15:69324 opID=1731b730)Tcpip_Vmk: 263: Lookup route failed 2017-12-18T03:45:13.357Z cpu19:10575834)CMMDS: AgentSendHeartbeatRequest:211: Agent requesting a reliable heartbeat from node 5a1e3e8b-2e6e-82a4-bfac-a0369fdec1c4 2017-12-18T03:45:41.383Z cpu22:66079)netschedHClk: NetSchedHClkNotify:2892: vmnic1: link down notification 2017-12-18T03:45:41.383Z cpu22:66079)netschedHClk: NetSchedHClkDoFlushQueue:3818: vmnic1: dropping 10 packets from queue netsched.pools.persist.mgmt 2017-12-18T03:45:41.383Z cpu1:65647)vdrb: VdrHandleUplinkEvent:1605: SYS:DvsPortset-0: Uplink event 2 for port 0x400000a, linkstate 0 2017-12-18T03:45:41.383Z cpu21:65645)<6>ixgbe 0000:01:00.0: vmnic0: changing MTU from 9000 to 1500 Now, after a detailed investigation of logs noticed that all DVS operations are coming from vCenter Moreover, we do see the following error message stating "The operation reconfigureDistributedVirtualSwitch on the host <> disconnected the host and was rolled back" From the above message we could clearly interpret that there is a connectivity issue between vCenter and the hosts. Since we do not see any driver/hardware related errors in the logs, we wanted to attempt increasing timeout value for network rollback under vpxd advanced settings. Procedure Use the vSphere Web Client to increase the timeout for a rollback on vCenter Server. If you encounter the same problem again, increase the rollback timeout with 60 seconds incrementally until the operation has enough time to succeed. On the Manage tab of a vCenter Server instance, click Settings. Select Advanced Settings and click Edit. If the property is not present, add the config.vpxd.network.rollbackTimeout parameter to the settings. Type a new value, in seconds, for the config.vpxd.network.rollbackTimeout parameter Click OK. Restart the vCenter Server system to apply the changes. The value was changed to 600 seocnds Once done, All hosts in the cluster were in a stable state and ready for NSX configuration. It looks like the changes were not being saved properly by vCenter Server in time. With this new timeout value, it had enough time to commit the transaction eventually stopping nic flaps. #vSphere
- Check vRO heap usage
Browse to path /var/log/vmware/vco/app-server/ and then execute below command grep heap.usage metrics.log* | grep -v non | sed 's/.*value=//g' | perl -e 'use List::Util qw(max min sum); @a=();while(<>){$sqsum+=$_*$_; push(@a,$_)};$n=@a;$s=sum(@a);$a=$s/@a;$m=max(@a);$mm=min(@a);$std=sqrt($sqsum/$n-($s/$n)*($s/$n));$mid=int @a/2;@srtd=sort @a;if(@a%2){$med=$srtd[$mid];}else{$med=($srtd[$mid-1]+$srtd[$mid])/2;};print "records:$n\nsum:$s\navg:$a\nstd:$std\nmed:$med\max:$m\min:$mm";' this will give values for current heap usage inside vRO appliance ( example below )
- Unchecking "Allow unlisted file name extensions" causes IAAS service registration failures
Request filters restrict the types of HTTP requests that IIS processes. By blocking specific HTTP requests, request filters help prevent potentially harmful requests from reaching the server. The request filter module scans incoming requests and rejects request that is unwanted based upon the rules that you set up For example, if you set the allowUnlisted attribute to false, all requests for files with extensions that are not contained in the list of allowed extensions will be denied. When request filtering blocks an HTTP request because of a denied file name extension, IIS will return an HTTP 404 error to the client and log the following HTTP status with a unique sub status that identifies the reason that the request was denied When request filtering is enabled Infrastructure as a Service component of vRealize Automation 7.x needs this option to be checked or enabled. IaaS uses .jar , .dll , .aspx .config .workflow and many more file extensions which ensures it's IIS functionality is intact and it serves it's application pools as expected. By no means, this setting has to be disabled. The moment you disallow unlisted file name extensions your Manager Service would go down as the extensions needed to run your application are not whitelisted and would be blocked. [UTC:2020-05-21 10:22:41 Local:2020-05-21 10:22:41] [Error]: [sub-thread-Id="100" context="" token=""] falseCollectedDataImportService: Ignoring exception: System.Data.Services.Client.DataServiceQueryException: An error occurred while processing this request. ---> System.Data.Services.Client.DataServiceClientException: * * * HTTP Error 404.7 - Not Found The request filtering module is configured to deny the file extension. Most likely causes: Request filtering is configured for the Web server and the file extension for this request is explicitly denied. Things you can try: Verify the configuration/system.webServer/security/requestFiltering/fileExtensions settings in applicationhost.config and web.config. The moment you re-enable "Allow Unlisted File Name Extensions" your Manager Service would automatically start functioning [UTC:2020-05-21 10:22:41 Local:2020-05-21 10:22:41] [Error]: [sub-thread-Id="100" context="" token=""] Error occurred writing to the repository tracking logSystem.Net.WebException: The remote server returned an error: (404) Not Found. at System.Data.Services.Client.BatchSaveResult.BatchRequest()at System.Data.Services.Client.DataServiceContext.SaveChanges(SaveChangesOptions options)at DynamicOps.Repository.RepositoryServiceContext.SaveChanges(SaveChangesOptions options) [UTC:2020-05-21 10:22:42 Local:2020-05-21 10:22:42] [Info]: [sub-thread-Id="7" context="" token=""] Processing ping report, report queue depth is 0 [UTC:2020-05-21 10:23:12 Local:2020-05-21 10:23:12] [Info]: [sub-thread-Id="7" context="" token=""] Processing ping report, report queue depth is 0 [UTC:2020-05-21 10:23:18 Local:2020-05-21 10:23:18] [Debug]: [sub-thread-Id="49" context="" token=""] DC: Created data collection item, WorkitemID 89c645a6-21c1-4086-857b-466d74fc32af, Task state, Agent premvc.prem.com, Entity primary, StatusID = f95f216b-ca19-41ef-9565-60326cdc94cd VMware's IIS hardening recommendation states that one has to go contact Microsoft for vendor's hardening guidelines VMware does not provide a list of extensions that have to be whitelisted. This is how it is from vCAC 4.x days. So if your hardening your IAAS system ensure you do not deselect "Allow unlisted file name extensions" and get into a problem !!! I hope this helps !!!
- A Comprehensive Guide for upgrading vRA 7.x through vRSLCM 2.x
Here's the guide on how you can upgrade your vRA 7.x environment using vRSLCM 2.x For this example, I am performing an upgrade from version 7.5 HF16 to 7.6 GA I have a simple environment wpacvra.prem.com being my vRealize Automation appliance wpaciaas.prem.com being my Infrastructure as a Service node IaaS database is installed on the same node where IaaS services are installed Before we start to upgrade we have to take a snapshot on all the components. This is a MANDATORY step Enter Snapshot Prefix and ensure do not check snapshot with memory box. We should not take snapshots on vRA components with memory Once the snapshots are completed, let's create a request to upgrade Click on the three dots on the right side of the environment and then click on upgrade You will be redirected to this below pane. Ensure we click on the checkbox for IaaS Snapshot. So that if IaaS upgrade fails so whatsoever reason, you can revert to Post-VA upgrade state and then retry upgrade again In this example, as shown below I would be using vRealize Suite Lifecycle Manager Repository which contains upgrade binaries Once done, when we click NEXT, we are now into Precheck phase Here we need to click on RUN-PRECHECK so that the request for validation is submitted successfully The first phase of precheck would be your data validation The second phase of precheck would be vRealize Automation validations During this phase, if there are any exceptions it would let you know as shown in the screenshot below The exception thrown above indicates that there is a reboot pending on my IAAS node. So let's reboot IAAS node and then rerun precheck As you can see below after we performed a reboot on our IAAS node the PreCheck phase did complete successfully I have attached PRECHECK report from my environment here for reference Since PRECHECK has now been successfully completed, click on NEXT to move on to review SUMMARY of the request. On the top right corner of this page, you have an option to run precheck on submit. This is optional. If you would like to run it again that's absolutely fine. It would just take a few more minutes extra. Once we click on SUBMIT, request of upgrading vRealize Automation is created If we have selected to run pre-check on submit, then there would be 4 steps, else 3 steps for a successful upgrade in a simple environment Step 2/4 performs vRA appliance upgrade This has various phases upgradevrava upgradevrava-com.vmware.vrealize.lcm.plugin.core.vra70.task.StartGenericVraTask FROM STATE com.vmware.vrealize.lcm.plugin.core.vra70.task.StartGenericVraTask TO STATE com.vmware.vrealize.lcm.core.vra70.task.upgrade.VraUpgradeTask upgradevrava-com.vmware.vrealize.lcm.core.vra70.task.upgrade.VraUpgradeTask This is the phase where your appliance is being upgraded. There are three phases in a vRA appliance upgrade Pre Update Upgrade Post Update Here's the list of logs you need to monitor during the pre-update phase /opt/vmware/var/log/vami/updatecli.log /var/log/bootstrap/preupdate.log The below screenshot shows that the preupdate phase has been successful. Now at this phase, we only have to monitor /opt/vmware/var/log/vami/updatecli.log This is where the upgrade phase logs are being logged. Following message is seen once the install phase is complete and post-install or postupdate starts 20/05/2020 10:32:26 [INFO] Update status: Done package installation 20/05/2020 10:32:26 [INFO] Update status: Running post-install scripts Here's a small snippet from postupdate.log Remember this upgrade VA task will be at 67% for a while till the upgrade is complete on the virtual appliance ( might change for a distributed environment ) Below screenshot indicate that the post update is now complete And the final phase of upgrade begins After a few seconds, the whole appliance upgrade is now complete upgradevrava-com.vmware.vrealize.lcm.platform.automata.service.task.FinalTask is now complete as shown below As we selected to take a snapshot after VA upgrade is complete, it would initialize a snapshot task now Here's the task completing on vCenter for IAAS node After snapshot task is complete which finished your Step#3 we now move on to IAAS upgrade phase This is where IAAS upgrade starts As the first step during this phase it would initiate a reboot on the virtual appliance Once the reboot task is complete as shown above, the new 7.6 VAMI would be available One thing to note during this phase we can monitor progress from /opt/vmware/var/log/vami/upgrade-iaas.log Unless all VA services are started it would not move ahead with IAAS components upgrade Here's the snippet which states that all services have been started [UTC: 2020-05-20 10:49:34.204100; Local: 2020-05-20 10:49:34.204108] [INFO]: Starting automatic IaaS upgrade. [UTC: 2020-05-20 10:49:34.204426; Local: 2020-05-20 10:49:34.204430] [INFO]: IaaS upgrade is pending: Waiting for VA services to start... [UTC: 2020-05-20 11:06:09.930810; Local: 2020-05-20 11:06:09.930820] [INFO]: All VA services have started. IAAS server components would now be identified and then upgrade would be initiated As stated above upgrade-iaas.log is one and the other one would be inside the IAAS node itself <>\Program Files (x86) \VMware\vCAC\ManagementAgent\Logs\CommandOutput Starts upgrading server components Once done, it goes ahead with DEM's and completes them Then comes Agents and few final tasks which complete your IAAS upgrade as well Now going back to vRSLCM UI, we can see that the final tasks are complete as well drawing curtains to the automated upgrade from 7.5 to 7.6 The whole request is now marked as completed Going back to the environment, we will now see the version to be 7.6 We can even check the same from VAMI Now let's deep-dive into various phases from logs perspective, this would help in troubleshooting ||| SECTION BEING UPDATED |||





