Terraform module to manage stateful cluster and modify one node at a time

I am trying to manage creation of stateful application like Cassandra, Mongodb etc in terraform. After the initial cluster creation any update should happen one node at a time.

The infra code should perform the node modification, should wait for the node to come up with new configuration and should ensure that node has joined the cluster and cluster as a whole is reported as healthy (along with the node just changed). And then proceed to next node for modification.

Currently, I am doing this by defining a module that define a single node of the cluster with the required configuration (and overridable parameters from the calling code). Then the calling code has to call the module n number of times to form a cluster of n nodes. This provides ability to define dependencies between the module resources and ensure rolling update scenario (this is similar to how stateful set work in kubernetes).

The downsize of this approach is dependencies between node modules has to be defined by the calling code which might be missed.

I would like to define the module itself where I can specify n as number of nodes and still perform update one node at a time (rather than updating all nodes at once).

Currently it seems it’s not possible to achieve it in terraform, if the module need to define n nodes where n is not know (for the module), then the only way it can be done is by using for or count resources.

But once a for is used to define a node resource then it seems that is not possible to define dependencies between multiple nodes.

How can I achieve this in terraform? Also, is this wrong solution for managing stateful applications. If people are managing the stateful services in different way please let me know as well.

Hi @kalkar.prashant,

Terraform itself does not address this sort of highly-orchestrated rollout process, so situations which require this will typically use Terraform only as a building block and not as the entire solution.

There are a few ways to use Terraform as part of a solution to this problem, but the essence of all of them is to run Terraform multiple times with a slightly different configuration each time, and then after each run perform some actions outside of Terraform to decide when to run the next step.

It sounds like you are already doing something like this but are perhaps doing so manually rather than with automation. My recommendation would be to aim to turn your manual process into an automated process which wraps around running terraform apply in a loop until you reach a new stable state with everything rolled out.

There are two main approaches to that:

  • Write a configuration which has root module input variables describing the intended cluster topology, and then use that data structure with module for_each to dynamically pivot from one configuration to another by changing node counts and then finally removing the old module instance altogether.

    Here’s a possible input variable data structure for that:

    variable "node_sets" {
      type = map(object({
        node_count = number
        # (whatever other settings you need to be
        # able to change during a rolling update)
      }))
    }
    
    module "node_sets" {
      source   = "../modules/node-set"
      for_each = var.node_sets
    
      node_count = each.value.node_count
      # (and the other arguments you need to
      # describe what's different between the
      # node sets.)
    }
    

    The idea here then would be that your deployment process generates a fresh node set key for each rolling deploy and runs Terraform with gradually decreasing node_count for the old node set and gradually increasing node_count for the new node set until the old node set reaches zero nodes, and which point you can totally remove it from the input variable and stabilize on only one node set again.

    If you follow this strategy then I would recommend generating a .tfvars.json file to set the value for this variable, because that means you can use a standard JSON library available in whatever language you prefer to use and not have to worry about generating valid Terraform native language syntax.

  • Have your automation generate its own root module configuration tailored for each iteration. This can potentially use the same “node set” module I mentioned in the previous point but instead of using a single module block with for_each you’d instead generate two module blocks for each of the intermediate steps, again decreasing and increasing the node counts to gradually rebalance to the new module block.

    This is more fiddly than the previous because it requires generating configuration rather than just generating input variable values, but it comes with the advantage that each of your node sets can potentially have different source and version arguments, which means you can also use this technique to migrate between separate versions of the node set module itself.

    module "node_set_abc123" {
      source  = "registry.example.com/foo/node-set/aws"
      version = "1.0.0"
    
      node_count = 4
    }
    
    module "node_set_def456" {
      source  = "registry.example.com/foo/node-set/aws"
      version = "1.1.0"
    
      node_count = 1
    }
    

    I’ve shown this using the Terraform native syntax above because it’s easier to read, but if it’ll be machine-generated then I’d suggest using JSON Configuration Syntax instead, because that’s typically easier to generate from arbitrary languages that happen to have JSON libraries available.

Both of these cases require your automation to track some state itself, outside of Terraform:

  • The current node set key (which is either a key in var.node_sets or a suffix on the module names, depending on which strategy you follow.
  • The new node set key describing the node set you’re currently transitioning to.
  • The total number of nodes that should be present in the cluster.
  • The number of those nodes that are currently assigned to the “new” node set, as opposed to the current node set.

Once the number of nodes in the new node set matches the total number of nodes (which by definition means that the “current” node set has zero nodes), you’d pivot to treating the new node set key as the current node set key and discard the previous node set key, so that the next change will generate a different new node set key in order to repeat the process.