top of page

Experienced Technology Product Manager adept at steering success throughout the entire product lifecycle, from conceptualization to market delivery. Proficient in market analysis, strategic planning, and effective team leadership, utilizing data-driven approaches for ongoing enhancements.

  • Twitter
  • LinkedIn
White Background

vRLI Cluster unresponsive as / partition full on 1 node due to multiple .hints file

Recently we've seen a situation where the root partition was full on vRLI appliance.

This was part of a vRLI 3 node cluster.

When this issue occurs, the cassandra service gets into a hung state and then this issue starts impacting other nodes in the cluster as well.

cassandra.log shows service unresponsive due to space issue on the root partition

INFO  [HANDSHAKE-XXXXXXX] 2020-03-04 10:47:57,384 - Handshaking version with XXXXXXX
INFO  [RequestResponseStage-3] 2020-03-04 10:47:57,400 - InetAddress /ZZZZZZZ is now UP
INFO  [GossipStage:1] 2020-03-04 10:47:58,379 - Node /ZZZZZZZ state jump to NORMAL
ERROR [HintsWriteExecutor:1] 2020-03-04 10:48:24,194 - Exception in thread Thread[HintsWriteExecutor:1,5,main] No space left on device
        at org.apache.cassandra.hints.HintsWriteExecutor.flushInternal( ~[apache-cassandra-3.11.2.jar:3.11.2]
        at org.apache.cassandra.hints.HintsWriteExecutor.flush( ~[apache-cassandra-3.11.2.jar:3.11.2]
        at org.apache.cassandra.hints.HintsWriteExecutor.lambda$flush$1( ~[apache-cassandra-3.11.2.jar:3.11.2]

The root partition was occupied by a .hprof file along with multiple .hints file and crc32 file getting created in /usr/lib/loginsight/application/lib/apache-cassandra-*/data/hints directory

Background on hints

Hints are one of three ways to support consistency in the system. When replica node is not available coordinator stores mutating data in temporary hint files to proceed as replica is available.

Ideally, in all vRLI deployments, it's configured that they are deleted after the default 3 hours. But somehow it's not working and hint files stay there seems forever in some environments.

Repairing runs automatically that is an addition way to support consistency in the system.

Manual deletion is solution in this situation.

This is a bug and will be addressed in upcoming releases of vRLI

366 views0 comments


Rated 0 out of 5 stars.
No ratings yet

Add a rating
bottom of page