Last week I was working on 2 such issues where RabbitMQ configuration was messed up.
The circumstances which led to these problems were entirely different though.
For the first instance, RabbitMQ encountered an unrecoverable state due to an outage. Second instance occurred because of an upgrade failure where an appliance was replaced with a new one and it never joined cluster properly causing services not being registered.
In both these scenarios, we had to manually break the cluster and then create one again.
Below steps would help you to achieve and successfully rebuild the cluster
Assuming you have 2 VA's in your vRA environment
Note: Snapshots are a must before you perform these steps. Ensure you have valid backups as well
Step 1
Stop all services on both the nodes using the command
vcac-vami service-manage stop vco-server vcac-server horizon-workspace elasticsearch
Step 2
Bring down the rabbitmq monitor by running " service cluster-rabbitmq-monitor stop " on both nodes
Step 3
Bring down rabbitmq by running " service rabbitmq-server stop " on both nodes
Step 4
Take a backup of folder /var/lib/rabbitmq* somewhere locally. Store it in a safe location
Erase rabbitmq state by running " rm -rf /var/lib/rabbitmq/* " on both nodes
Step 5
REBOOT appliances
Step 6
When appliances are starting up there would be a point where it has to start rabbitmq. Ensure this is started properly
Step 7
Verify rabbitmq is running by executing the command
service rabbitmq-server status
Validate rabbitmq is running only on a single node by executing the command
rabbitmqctl cluster_status
Step 8
Prepare the first node for clustering by running "
vcac-vami rabbitmq-cluster-config set-cluster-node
Step 9
Execute the command on the first node to get the cluster-info
vcac-vami rabbitmq-cluster-config generate_join_cluster_variables
Note down the output, you will need it in the next step
Step 10
Execute the command on the second node, replacing USERNAME, PASSWORD, COOKIE, HOST, and USE_LONGNAME with their value from the previous step
vcac-vami rabbitmq-cluster-config join-cluster 'USERNAME' 'PASSWORD' 'COOKIE' 'HOST' 'USE_LONGNAME'
Step 11
Validate that rabbitmq is now running in cluster mode on both nodes by running
rabbitmqctl cluster_status
Step 12
Bring up rabbitmq monitor by running below command on both nodes
service cluster-rabbitmq-monitor start
Step 13
Execute below command to stop and start services on both MASTER and REPLICA in a systematic manner. Start from your MASTER node first. Once the services are coming back then you may go to REPLICA
vcac-vami service-manage stop vco-server vcac-server horizon-workspace elasticsearch && vcac-vami service-manage start elasticsearch horizon-workspace hzn-dots vcac-server vco-server
*** I'll share example with screenshots soon / recording soon ***
Comentarios