How do you automate scaling up/down during the day/night to save resources?

When using TF Cloud and a git repo, how is this possible?

If you use the cloud cli on a schedule, you get a drift from the TF state.

Thanks,
Gunnar

Terraform is really designed for situations where it is completely in control of resources, instead of where something else is making changes it knows nothing about. One thing that might help is the ignore_changes meta argument.

Another approach would be to do the scaling up/down via Terraform itself, by using count or for_each settings, with values that refer to Terraform input variables.

e.g.

resource "somekindof_vm" {
  count = var.vm_scale
  // other settings here
}

variable "vm_scale" {
  type = number
}
terraform apply -auto-approve -var vm_scale=5
# time goes by
terraform apply -auto-approve -var vm_scale=1
# time goes by
terraform apply -auto-approve -var vm_scale=5
1 Like

Ah nice, thanks! :slight_smile:
Input parameter seems like the way to go.

Yet another option (there are several!) is to use some mechanism provided by your cloud platform to create and destroy the instances and then have Terraform configure that mechanism rather than configuring the instances directly. In this case Terraform doesn’t actually track the individual instances and so only changes to the settings for whatever system is managing the instances will show as changes in Terraform.

For example, in AWS you can use EC2 autoscaling to indirectly manage a collection of EC2 instances. If you manage an autoscaling group with Terraform then the Terraform configuration describes the launch template (the parameters to use when creating an instance) but does not describe the individual instances. The autoscaling system then itself creates a suitable number of instances to meet the autoscaling group’s “desired count”, “min count”, and “max count”.

That can be combined with scheduled scaling to automatically change the desired count at particular times, which will then in turn trigger scaling actions that could create or destroy instances using the template.

For AWS autoscaling in particular the provider design is such that the desired count change will be reflected in the state during refresh. But that’s at least just a change to one particular attribute, and one whose change should be intuitive to anyone who is aware that the desired count is being changed on schedule. (You should use ignore_changes on desired_count to avoid a Terraform run then trying to undo the scheduled change.)

@maxb
When using TF Cloud, you’re suggesting CLI-driven workflow, right?

This wouldn’t be possible when using Version control workflow (git repo), right? If so, what’s the best way for Version control workflow?

@stuart-c
That seems like an overkill. From your link:

ignore_changes meta-argument specifies resource attributes that Terraform should ignore when planning updates to the associated remote object.

I might want to update the configuration of that object in the future and it would be nice for terraform to reflect that in the state and keep everything in sync. The ignore flag, for me, is for already existing resources that were done before TF.

@apparentlymart

Yet another option (there are several!) is to use some mechanism provided by your cloud platform to create and destroy the instances and then have Terraform configure that mechanism rather than configuring the instances directly.

Wouldn’t that affect the availability of the resource?

I’m not sure what exactly you mean by “availability” in this context. My understanding of your problem statement is that you want to intentionally make some of the resources unavailable (not existing at all) during a low-demand period, and so I was answering on that assumption.

@apparentlymart

Yet another option (there are several!) is to use some mechanism provided by your cloud platform to create and destroy the instances and then have Terraform configure that mechanism rather than configuring the instances directly.

I’m not sure what exactly you mean by “availability” in this context. My understanding of your problem statement is that you want to intentionally make some of the resources unavailable (not existing at all) during a low-demand period, and so I was answering on that assumption.

If you destroy the instance and created it again (using TF), won’t your workload suffer from downtime?

Not necessarily.

Variables can be defined in TF Cloud workspaces.

These are made available to all runs, including ones triggered by the version-control workflow.

It is also possible to trigger extra runs via the TF Cloud API or UI whilst using the version-control workflow.

So, scaling could be applied to the variables in the TF Cloud workspace, and extra runs triggered to apply just the variable changes, whilst still using the version-control workflow for actual chances to the Terraform code.

If you change the desired count to be zero then indeed there will be no instances serving requests and so the overall system will be down.

I was imagining instead merely reducing the desired count to a smaller value still greater than zero, which would mean that there are still some instances running and serving requests but you are no longer wasting instances that are overprovisioned capacity for the current demand level.

This thread started with a very abstract scenario so I can’t possibly guess what the availability requirements are for your specific system. Of course there will be situations where an autoscaling-like approach isn’t appropriate, but there were no explicit requirements given so I shared a pattern that I’ve seen people employ successfully in situations similar to the one I described above.