top of page

Experienced Technology Product Manager adept at steering success throughout the entire product lifecycle, from conceptualization to market delivery. Proficient in market analysis, strategic planning, and effective team leadership, utilizing data-driven approaches for ongoing enhancements.

  • Twitter
  • LinkedIn
White Background

What's VMware Identity Manager Cluster Auto-Recovery in VMware Aria Suite Lifecycle 8.14


 

VMware Aria Suite Lifecycle 8.14 introduces an innovative capability known as "VMware Identity Manager Cluster Auto-Recovery".


Why do we need this ?

The aim of this 'autorecovery' service is to minimize the necessity to do the time-consuming 'Remediate' operation from the Suite Lifecycle UI.


In greenfield deployments, this feature is automatically activated, while in brownfield deployments, it needs to be manually enabled after upgrading to VMware Aria Suite Lifecycle 8.14



 

How does it work?

The 'autorecovery' service is deployed as a Linux service and operates on all three nodes within the vIDM cluster


  • Operations like start/restart of the pgPool service is controlled by the script individually on each node

  • The handling of cluster-VIP or delegateIP is handled only on nodes with the role as primary

  • Detachment of VIO is done on standby nodes based on their role as "standby"

  • Operations like "recovery" is synchronized base don the node's status as database "primary" and in one case and cluster "leader" if all nodes are in standby

Because of which there would be no duplicate operations which are triggered by any of the nodes



 

What challenges or issues does this feature tackle?

  • Cluster-VIP loss on primary

  • Cluster issues due to network outages

  • Recovery of "down" cluster node/s

  • Avoids necessity to initiate the "Remediate" from UI in most cases, which involves node(s) restarts contributing to downtime

  • This script eliminates the necessity of rebooting vIDM nodes because of PostgreSQL cluster problems

  • Recovery in cases with significant replication delay ('significant' configurable in bytes, say more than 1000 bytes of lag between primary and secondary)

  • Recovery in rare cases of all nodes are in a ‘standby’ state

  • Prevent discrepancies in the /etc/hosts


 

Is there any downtime during the execution of auto-recovery?

No, there is no downtime when the auto-recovery script is triggered in the backend for any of the reasons

 

How do i enable and disable this feature?


Once enabled , users can come to day-2 operations pane of globalenvironment or vIDM and then choose to disable and vice-versa




 

458 views1 comment

1 Comment

Rated 0 out of 5 stars.
No ratings yet

Add a rating
Guest
Dec 21, 2023
Rated 5 out of 5 stars.

Great article. Even with auto_recovery_replication_delay_threshold=1000 in lcm-pgpool.conf, LCM still sends critical email alerts saying "Node(s) x.x.x.x, have a replication delay with respect to the master node" when the replication delay to the master is over 100 bytes. Where do you configure the lcm alert to only trigger if delay is greater then say 1000 bytes? Thanks.

Like
bottom of page