Click here for detailed analysis
vRealize Automation 8.4.2 was release few days back and it had some fixes which we were looking forward to w.r.t custom resources and dynamic types.
I was testing this behavior in my lab and came across an exception which led to upgrade failure. Others might bump into this as well. So let me explain what it was...
My environment was built on vRA 8.2 , where a migration was performed , specific to Custom Resources and then we did upgrade to 8.4.1. There is a known issue in version vRA 8.4.1 with Customer Resources where we get 500 OK exception when we try and execute Day-2 actions.
After 8.4.1 upgrade , we then wanted to 8.4.2
Upgrade goes through but the exception is seen when identity-service-app is being initialized
2021-06-26T07:31:56.205Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] com.zaxxer.hikari.HikariDataSource.getConnection:110 - HikariPool-1 - Starting...
2021-06-26T07:31:56.677Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] com.zaxxer.hikari.HikariDataSource.getConnection:123 - HikariPool-1 - Start completed.
2021-06-26T07:31:57.474Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] liquibase.lockservice.log:23 - Successfully acquired change log lock
2021-06-26T07:31:58.508Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] liquibase.changelog.log:23 - Reading from public.databasechangelog
2021-06-26T07:31:58.683Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] liquibase.lockservice.log:23 - Successfully released change log lock
2021-06-26T07:31:58.689Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] liquibase.lockservice.log:23 - Successfully acquired change log lock
2021-06-26T07:31:58.690Z WARN identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] liquibase.hub.log:23 - Skipping auto-registration
2021-06-26T07:31:58.806Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] liquibase.changelog.log:23 - Table identity_lock created
2021-06-26T07:31:58.814Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] liquibase.changelog.log:23 - ChangeSet db/changelog/28-add-lock-table.xml::1619005213735-1::zshindova (generated) ran successfully in 115ms
2021-06-26T07:31:58.834Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] liquibase.changelog.log:23 - Marking ChangeSet: db/changelog/29-update-migration-roles.xml::1622846852-1::tairskiv ran despite precondition failure due to onFail='MARK_RAN':
db/changelog/db.changelog-master.xml : SQL Precondition failed. Expected '0' got '1'
2021-06-26T07:31:58.845Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] liquibase.changelog.log:23 - Data updated in identity_user_service_role
2021-06-26T07:31:58.847Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] liquibase.changelog.log:23 - Data updated in identity_user_service_role
2021-06-26T07:31:58.852Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] liquibase.changelog.log:23 - Data updated in identity_group_role
2021-06-26T07:31:58.853Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] liquibase.changelog.log:23 - Data updated in identity_group_role
2021-06-26T07:31:58.874Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] liquibase.lockservice.log:23 - Successfully released change log lock
2021-06-26T07:31:58.876Z WARN identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] o.s.b.w.r.c.AnnotationConfigReactiveWebServerApplicationContext.refresh:559 - Exception encountered during context initialization - cancelling refresh attempt: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'liquibase' defined in class path resource [org/springframework/boot/autoconfigure/liquibase/LiquibaseAutoConfiguration$LiquibaseConfiguration.class]: Invocation of init method failed; nested exception is liquibase.exception.LiquibaseException: liquibase.exception.MigrationFailedException: Migration failed for change set db/changelog/29-update-migration-roles.xml::1622846852-2::tairskiv:
Reason: liquibase.exception.DatabaseException: ERROR: update or delete on table "identity_service_role" violates foreign key constraint "fkhcfahltn20109qfxny3vpogm" on table "identity_group_service_role"
Detail: Key (id)=(6799b5da-19f5-4aa3-9bd3-9376b3ac4890) is still referenced from table "identity_group_service_role". [Failed SQL: (0) DELETE FROM public.identity_service_role WHERE service_definition_id = (SELECT id FROM identity_service_definition WHERE name = 'Migration')]
2021-06-26T07:31:58.876Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] com.zaxxer.hikari.HikariDataSource.close:350 - HikariPool-1 - Shutdown initiated...
2021-06-26T07:31:58.898Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] com.zaxxer.hikari.HikariDataSource.close:352 - HikariPool-1 - Shutdown completed.
2021-06-26T07:31:58.905Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] org.ehcache.core.EhcacheManager.removeCache:216 - Cache 'users' removed from EhcacheManager.
2021-06-26T07:31:58.906Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] org.ehcache.core.EhcacheManager.removeCache:216 - Cache 'user_groups' removed from EhcacheManager.
2021-06-26T07:32:03.963Z INFO identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] o.s.b.a.l.ConditionEvaluationReportLoggingListener.logMessage:136 -
Error starting ApplicationContext. To display the conditions report re-run your application with 'debug' enabled.
2021-06-26T07:32:03.993Z ERROR identity-service [host='identity-service-app-568d9fbbc-8zfns' thread='main' user='' org='' trace=''] o.s.boot.SpringApplication.reportFailure:834 - Application run failed
Note:
1. Below command will help you to connect to postgres database
vracli dev psql
\c identity-db
**** STRONGLY ENCOURAGED NOT TO CONNECT UNLESS INSTRUCTED BY VMWARE ****
2. The best way would be to take a backup , restore it on a postgres server installed on Linux or Windows VM , then execute your queries on it...
vracli db backup identity-db > /root/identity-db-<<date>>.psql
Above command will help you to take a backup.
<<date>> is a variable , substitute it accordingly.
3. To restore
psql -U postgres -d **destinationdb** < **path-to-the-sql-file**
Create the dummy database first to restore
As part of investigation connected to identity service database and then found out that this id referenced in the snippet is part of three tables
ID : 6799b5da-19f5-4aa3-9bd3-9376b3ac4890
Then went ahead and looked into each of those tables
select * from identity_service_role where id = '6799b5da-19f5-4aa3-9bd3-9376b3ac4890';
As shown in the above screenshot this belonged to service role " Migration Assistant Administrator"
The identity-service-app was stuck in "CrashLoopBackOff" state
After a while upgrade did fail with following exception
Exception is nothing different a generic one.
com.vmware.vrealize.lcm.common.exception.EngineException: Upgrade on vRealize Automation VA nbnvra.nukescloud.com failed with state fatal. To know more about the failure, run command "vracli upgrade status --details" on the vRealize Automation VA nbnvra.nukescloud.com. If user wants to revert snapshot and trigger upgrade again, click RETRY with revertSnapshotNRetryUpgrade property set to true (or) If user wants to cancel the whole uprgade and revert to the state before upgrade click RETRY with cancelUpgradeNRevertBack property set to true. If both the retry properties are set to true, revertSnapshotNRetryUpgrade property will take precedence and will be honoured at com.vmware.vrealize.lcm.plugin.core.vra80.task.VraVaUpgradeStatusCheckTask.execute(VraVaUpgradeStatusCheckTask.java:329) at com.vmware.vrealize.lcm.automata.core.TaskThread.run(TaskThread.java:45) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source)
Since the logs from identity-service-app were stating that it's unable to make a update or change to postgres database to a certain ID which belongs to Migration Assistant Administrator , this clearly gave an indication to a recent change made to this service role in 8.4.1
We have Migration Assistant Administrator recently moved into Cloud Assembly role. This change was introduced in 8.4.1. And 8.4.2 was supposed to bring in some enhancements to this.
Had no choice but to revert this failed upgrade back to 8.4.1.
Then the first thing I did is to go and remove the vRA Migration Assistant Administrator role and then add the new Cloud Assembly based role in place
This is how it looks after the change
Post that another upgrade attempt has been made and it definitely succeeded
2021-06-26T12:19:16.708Z INFO identity-service [host='identity-service-app-568d9fbbc-949ft' thread='main' user='' org='' trace=''] liquibase.changelog.log:23 - ChangeSet db/changelog/28-add-lock-table.xml::1619005213735-1::zshindova (generated) ran successfully in 86ms
2021-06-26T12:19:16.724Z INFO identity-service [host='identity-service-app-568d9fbbc-949ft' thread='main' user='' org='' trace=''] liquibase.changelog.log:23 - Marking ChangeSet: db/changelog/29-update-migration-roles.xml::1622846852-1::tairskiv ran despite precondition failure due to onFail='MARK_RAN':
db/changelog/db.changelog-master.xml : SQL Precondition failed. Expected '0' got '1'
2021-06-26T12:19:16.738Z INFO identity-service [host='identity-service-app-568d9fbbc-949ft' thread='main' user='' org='' trace=''] liquibase.changelog.log:23 - Data updated in identity_user_service_role
2021-06-26T12:19:16.739Z INFO identity-service [host='identity-service-app-568d9fbbc-949ft' thread='main' user='' org='' trace=''] liquibase.changelog.log:23 - Data updated in identity_user_service_role
2021-06-26T12:19:16.744Z INFO identity-service [host='identity-service-app-568d9fbbc-949ft' thread='main' user='' org='' trace=''] liquibase.changelog.log:23 - Data updated in identity_group_role
2021-06-26T12:19:16.746Z INFO identity-service [host='identity-service-app-568d9fbbc-949ft' thread='main' user='' org='' trace=''] liquibase.changelog.log:23 - Data updated in identity_group_role
2021-06-26T12:19:16.749Z INFO identity-service [host='identity-service-app-568d9fbbc-949ft' thread='main' user='' org='' trace=''] liquibase.changelog.log:23 - Data deleted from identity_service_role
2021-06-26T12:19:16.750Z INFO identity-service [host='identity-service-app-568d9fbbc-949ft' thread='main' user='' org='' trace=''] liquibase.changelog.log:23 - Data deleted from identity_service_definition
2021-06-26T12:19:16.754Z INFO identity-service [host='identity-service-app-568d9fbbc-949ft' thread='main' user='' org='' trace=''] liquibase.changelog.log:23 - ChangeSet db/changelog/29-update-migration-roles.xml::1622846852-2::tairskiv ran successfully in 25ms
2021-06-26T12:19:16.760Z INFO identity-service [host='identity-service-app-568d9fbbc-949ft' thread='main' user='' org='' trace=''] liquibase.lockservice.log:23 - Successfully released change log lock
2021-06-26T12:19:17.653Z INFO identity-service [host='identity-service-app-568d9fbbc-949ft' thread='main' user='' org='' trace=''] o.hibernate.jpa.internal.util.LogHelper.logPersistenceUnitInformation:31 - HHH000204: Processing PersistenceUnitInfo [name: default]
2021-06-26T12:19:18.383Z INFO identity-service [host='identity-service-app-568d9fbbc-949ft' thread='main' user='' org='' trace=''] org.hibernate.Version.logVersion:44 - HHH000412: Hibernate ORM core version 5.4.30.Final
2021-06-26T12:19:19.657Z INFO identity-service [host='identity-service-app-568d9fbbc-949ft' thread='main' user='' org='' trace=''] o.hibernate.annotations.common.Version.<clinit>:56 - HCANN000001: Hibernate Commons Annotations {5.1.2.Final}
2021-06-26T12:19:20.489Z INFO identity-service [host='identity-service-app-568d9fbbc-949ft' thread='main' user='' org='' trace=''] org.hibernate.dialect.Dialect.<init>:175 - HHH000400: Using dialect: org.hibernate.dialect.PostgreSQL94Dialect
2021-06-26T12:19:21.922Z INFO identity-service [host='identity-service-app-568d9fbbc-949ft' thread='main' user='' org='' trace=''] o.h.e.t.j.p.i.JtaPlatformInitiator.initiateService:52 - HHH000490: Using JtaPlatform implementation: [org.hibernate.engine.transaction.jta.platform.internal.NoJtaPlatform]
2021-06-26T12:19:21.937Z INFO identity-service [host='identity-service-app-568d9fbbc-949ft' thread='main' user='' org='' trace=''] o.s.o.j.LocalContainerEntityManagerFactoryBean.buildNativeEntityManagerFactory:419 - Initialized JPA EntityManagerFactory for persistence unit 'default'
2021-06-26T12:19:22.052Z INFO identity-service [host='identity-service-app-568d9fbbc-949ft' thread='main' user='' org='' trace=''] c.v.i.s.i.PropertyConfigurationService.init:70 - Loaded classpath:config_service.properties
2021-06-26T12:19:22.095Z INFO identity-service [host='identity-service-app-568d9fbbc-949ft' thread='main' user='' org='' trace=''] c.v.i.config.SecurityConfiguration.readKeyStorePasswordFromFile:110 - Loaded /secrets/keystore.pass
2021-06-26T12:19:22.095Z INFO identity-service [host='identity-service-app-568d9fbbc-949ft' thread='main' user='' org='' trace=''] c.v.i.config.SecurityConfiguration.setKeyStorePassword:80 - Setting the javax.net.ssl.keyStorePassword system property from a file.
2021-06-26T12:19:23.589Z INFO identity-service [host='identity-service-app-568d9fbbc-949ft' thread='main' user='' org='' trace=''] c.v.i.service.impl.CryptoServiceImpl.loadKeystore:472 - Loaded keystore
2021-06-26T12:19:24.330Z INFO identity-service [host='identity-service-app-568d9fbbc-949ft' thread='main' user='' org='' trace=''] c.v.a.s.w.platform.server.util.AppUtil.readTrustStorePasswordFromFile:126 - Loaded /jdk/lib/security/cacerts.pass
2021-06-26T12:19:24.330Z INFO identity-service [host='identity-service-app-568d9fbbc-949ft' thread='main' user='' org='' trace=''] c.v.a.s.w.platform.server.util.AppUtil.configureTrustStorePassword:95 - Setting the javax.net.ssl.trustStorePassword system property from file.
2021-06-26T12:19:24.335Z INFO identity-service [host='identity-service-app-568d9fbbc-949ft' thread='main' user='' org='' trace=''] c.v.a.s.w.platform.server.util.AppUtil.configureTrustStorePassword:90 - The javax.net.ssl.trustStorePassword is already set.
2021-06-26T12:19:26.316Z INFO identity-service [host='identity-service-app-568d9fbbc-949ft' thread='main' user='' org='' trace=''] c.v.identity.rest.EbsRestClientFactory.<init>:77 - Extensibility is configured on 'null' URL with clientID 'prelude-user-eFWWZ4dVxU'
2021-06-26T12:19:26.426Z INFO identity-service [host='identity-service-app-568d9fbbc-949ft' thread='main' user='' org='' trace=''] o.s.b.a.s.r.ReactiveUserDetailsServiceAutoConfiguration.getOrDeducePassword:88 -
Using generated security password: 326c6151-5283-48a7-93dc-90ed354fd40e
2021-06-26T12:19:27.347Z INFO identity-service [host='identity-service-app-568d9fbbc-949ft' thread='main' user='' org='' trace=''] com.vmware.identity.rest.RestClient.lambda$logRequest$1:72 - POST https://nbnidm.nukescloud.com/SAAS/API/1.0/oauth2/token?grant_type=client_credentials
2021-06-26T12:19:27.708Z INFO identity-service [host='identity-service-app-568d9fbbc-949ft' thread='main' user='' org='' trace=''] com.vmware.identity.rest.RestClient.lambda$logRequest$1:72 - POST https://nbnidm.nukescloud.com/SAAS/API/1.0/oauth2/token?grant_type=client_credentials
2021-06-26T12:19:27.910Z INFO identity-service [host='identity-service-app-568d9fbbc-949ft' thread='main' user='' org='' trace=''] c.v.a.s.w.p.s.util.SqlRetryableUtil.<clinit>:131 - Calculated value for sql-retry-policy.max-retries: 8
2021-06-26T12:19:29.166Z INFO identity-service [host='identity-service-app-568d9fbbc-949ft' thread='reactor-http-epoll-1' user='' org='' trace=''] com.vmware.identity.rest.RestClient.lambda$logRequest$1:72 - GET https://nbnidm.nukescloud.com/SAAS/jersey/manager/api/scim/Users?filter=userName%20eq%20%22configadmin%22
Post this all the pods were initialized as expected and upgrade was successful
Lesson Learnt
Remove the standalone vRA Migration Assistant service role and then use Cloud Assembly based vRA Migration Assistant
Perform this check before you start an upgrade to vRealize Automation 8.4.2 to ensure there are no failures
As one can see , once we login into vRA UI of 8.4.2 and check the roles , there is no standalone vRA Migration assistant role anymore
Remember
vRA MIGRATION ASSISTANT role should be removed for all the AD groups added under IDENTITY & ACCESS MANAGEMENT
After upgrade you may add Cloud Assembly based Migration roles back to these groups
If you are already in a situation where identity-service has crashed and you have not read the release notes of 8.4.2 or the KB before ,then follow these steps
1. ssh root@FQDN
2. vracli dev psql
3. \c identity-db
4. copy and paste the following:
DO $$
BEGIN
IF NOT EXISTS (SELECT def.id from identity_service_role AS role INNER JOIN identity_service_definition AS def ON role.service_definition_id = def.id WHERE role.name = 'migration:admin' AND def.name = 'Cloud Assembly') THEN
INSERT INTO identity_service_role (id, is_default, created_millis, hidden, display_name, name, service_definition_id)
VALUES ((SELECT uuid_in(md5(random()::text || clock_timestamp()::text)::cstring)), TRUE, (SELECT round(date_part('epoch', now() ) * 1000 )), FALSE, 'Migration Assistant Administrator', 'migration:admin', (SELECT id from identity_service_definition where name = 'Cloud Assembly'));
INSERT INTO identity_service_role (id, is_default, created_millis, hidden, display_name, name, service_definition_id)
VALUES ((SELECT uuid_in(md5(random()::text || clock_timestamp()::text)::cstring)), FALSE, (SELECT round(date_part('epoch', now() ) * 1000 )), FALSE, 'Migration Assistant Viewer', 'migration:viewer', (SELECT id from identity_service_definition where name = 'Cloud Assembly'));
END IF;
UPDATE identity_group_service_role
SET role_id = (SELECT id FROM identity_service_role WHERE service_definition_id = (SELECT id FROM identity_service_definition WHERE name = 'Cloud Assembly') AND name = 'migration:viewer')
WHERE role_id = (SELECT id FROM identity_service_role WHERE service_definition_id = (SELECT id FROM identity_service_definition WHERE name = 'Migration') AND name = 'migration:viewer');
UPDATE identity_group_service_role
SET role_id = (SELECT id FROM identity_service_role WHERE service_definition_id = (SELECT id FROM identity_service_definition WHERE name = 'Cloud Assembly') AND name = 'migration:admin')
WHERE role_id = (SELECT id FROM identity_service_role WHERE service_definition_id = (SELECT id FROM identity_service_definition WHERE name = 'Migration') AND name = 'migration:admin');
END $$ LANGUAGE plpgsql;
The output of above query would just say
DO
Then run a command to restart the services
/opt/scripts/deploy.sh
Once all the pods are up , then perform inventory sync on the environment which should now show up as 8.4.2
If you've done this directly through vRA without completing the upgrade workflow through LCM, you have to clean the upgrade in vRA state manually with this command by running it on one of the nodes
vracli cluster exec -- bash -c 'rm -rf /data/restorepoint /var/vmware/prelude/upgrade /var/log/vmware/prelude/upgrade-report-latest*; crontab -u root -l | grep -v -F "/opt/scripts/upgrade/upg-mon.sh" | crontab -u root -'
Comments