There was a vRSLCM or VASL environment where the product UI wasn't loading at all after a reboot
After checking logs , none of the application services were running as expected. Apart from bootstrap all of them were stopped or in "not running" status
What could have happened ? How do i resolve this ?

If i look closely , the postgres service is stopped. Unless postgres is up , none of the application services would start.
So my starting point would be postgres.
Executing status command on vposgres tells us the story
/storage/db/pgdata ]# systemctl status vpostgres
* vpostgres.service - VMware Postgres database server
Loaded: loaded (/etc/systemd/system/vpostgres.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2023-06-29 15:25:54 UTC; 32s ago
Process: 27267 ExecStart=/opt/vmware/vpostgres/current/bin/pg_ctl -s -D ${VMWARE_POSTGRES_DATA} -w -t ${VMWARE_POSTGRES_PGCTL_TIMEOUT} start (code=exited, status=1/FAILURE)
Jun 29 15:25:54 lcm-fqdn systemd[1]: vpostgres.service: Service RestartSec=100ms expired, scheduling restart.
Jun 29 15:25:54 lcm-fqdn systemd[1]: vpostgres.service: Scheduled restart job, restart counter is at 5.
Jun 29 15:25:54 lcm-fqdn systemd[1]: Stopped VMware Postgres database server.
Jun 29 15:25:54 lcm-fqdn systemd[1]: vpostgres.service: Start request repeated too quickly.
Jun 29 15:25:54 lcm-fqdn systemd[1]: vpostgres.service: Failed with result 'exit-code'.
Jun 29 15:25:54 lcm-fqdn systemd[1]: Failed to start VMware Postgres database server.
root@lcm-fqdn [ /storage/db/pgdata ]# systemctl start vpostgres
Job for vpostgres.service failed because the control process exited with error code.
See "systemctl status vpostgres.service" and "journalctl -xe" for details.
root@lcm-fqdn [ /storage/db/pgdata ]#
Checking journalctl logs , we come to know why it was failing
Reference Command : journalctl -u vpostgres.service --since "YYYY-MM-DD"
Jun 29 12:18:22 lcm-fqdn systemd[1]: vpostgres.service: Control process exited, code=exited status=1
Jun 29 12:18:22 lcm-fqdn systemd[1]: vpostgres.service: Failed with result 'exit-code'.
Jun 29 12:18:22 lcm-fqdn systemd[1]: Failed to start VMware Postgres database server.
Jun 29 12:18:22 lcm-fqdn systemd[1]: vpostgres.service: Service RestartSec=100ms expired, scheduling restart.
Jun 29 12:18:22 lcm-fqdn systemd[1]: vpostgres.service: Scheduled restart job, restart counter is at 1.
Jun 29 12:18:22 lcm-fqdn systemd[1]: Stopped VMware Postgres database server.
Jun 29 12:18:22 lcm-fqdn systemd[1]: Starting VMware Postgres database server...
Jun 29 12:18:22 lcm-fqdn postgres[53444]: pg_ctl: directory "/var/vmware/vpostgres/current/pgdata" does not exist
Jun 29 12:18:22 lcm-fqdn systemd[1]: vpostgres.service: Control process exited, code=exited status=1
Jun 29 12:18:22 lcm-fqdn systemd[1]: vpostgres.service: Failed with result 'exit-code'.
Jun 29 12:18:22 lcm-fqdn systemd[1]: Failed to start VMware Postgres database server.
Jun 29 12:18:22 lcm-fqdn systemd[1]: vpostgres.service: Service RestartSec=100ms expired, scheduling restart.
Jun 29 12:18:22 lcm-fqdn systemd[1]: vpostgres.service: Scheduled restart job, restart counter is at 2.
Jun 29 12:18:22 lcm-fqdn systemd[1]: Stopped VMware Postgres database server.
The whole folder contents of /var/vmware/** was missing on the problematic environment
On a working environment like my lab , this is how the folder structure would look

Now , if there is no snapshot how do i fix this ?
Fortunately , we have the database which was under /storage/db/pgdata and that was intact.
So to fix , the thought process was to create missing folders and symbolic links and see if that helps. As during postgres startup it was complaining about only this issue.
Here are the steps taken to fix the problem
Step-1 : Take snapshot of vRSLCM / VASL appliance
Step-2 : Create vpostgres directory
mkdir /var/vmware/vpostgres
Step-3 : Set appropriate permissions to vpostgres directory
chmod 755 /var/vmware/vpostgres
Step-4 : Under vpostgres directory create version folder , which is 11
mkdir /var/vmware/vpostgres/11
Step-5 : Assign postgres:users permission to the folder created
chown postgres:users /var/vmware/vpostgres/11
Step-6 : Change permissions for version directory "11"
chown 700 /var/vmware/vpostgres/11
Step-7 : Create symbolic link "11" as shown below
ln -s /var/vmware/vpostgres/11 11
Step-8 : Browse to the directory "11"
cd /var/vmware/vpostgres/11
Step-9 : Create directory pgdata
mkdir /var/vmware/vpostgres/11/pgdata
Step-10 : Browse to directory pgdata
cd pgdata
Step-11 : Assign postgres:users to pgdata folder
chmod postgres:users pgdata
Step-12 : Create symbolic link for pgdata which points to the actual db location
ln -s /storage/db/pgdata pgdata
Step-13 : Restart postgres service vRSLCM
systemctl start vpostgres.service
Step-13 : Restart postgres service vRSLCM / VASL
systemctl stop vrlcm-server.service
systemctl start vrlcm-server.service
systemctl status vrlcm-server.service
Step-14 : Monitor /var/log/vrlcm/vmware_vrlcm.log for status. Ideally if the services are up UI and the whole application should be up
After executing these steps , vRSLCM or VASL UI is now up
Comments