Grouping providers to mutualize resources

Hello,

Let’s say I have a group of hypervisors that are managed independently. In my case these are libvirt hypervisors that can be managed by independent libvirt provider instances.

If I want to create a VM on this cluster, I have to choose an instance to pop it in. Also later on, if I migrate a VM from one instance to another, at next terraform apply, the VM will be deleted from the second instance and re-created on the second one.

I wish there is a way to construct a group of provider instances in order to consider resources on them as part of the group, and not attached to a single instance. A way to tell terraform to ignore where a resource is (except maybe if there is a special instruction on it) as long as it is there somewhere. As a bonus, with different strategies for resources creation, like “create the VM on a random host” or “create the VM on the currently least loaded host”.

I hope my explanation makes sense.

Hi @raspbeguy!

The treatment of what should happen when a resource gets associated with a different provider configuration today is ultimately decided by the provider itself, but in most providers I’m familiar with (which does not include libvirt) I would expect the behavior to be as you described:

If you associate an existing resource with a different provider configuration then on the next plan Terraform will ask the new provider confirmation to “refresh” the resource instances. If the new provider configuration reports that the object exists then Terraform should just plan changes against it as normal, without proposing to replace it.

Some providers alter this behavior with some additional logic. For example, in a provider for a remote system that has multiple regions that are all independent the most ideal behavior is for the provider to remember for each individual resource instance which region is belonged to at the time it was created, and then if it becomes associated with a provider configuration with a different region the provider developer can choose to either ignore that change and just keep managing the provider in the old region or it can propose to replace the object, deleting the one in the old region and creating a new one in the new region.

I’m not sure what behavior the libvirt provider implements because I’ve not used libvirt with Terraform before, but if you can show some specific examples of the behavior you’ve seen that might help to understand whether the change you’re hoping for would be a change to Terraform itself or a change to the libvirt provider in particular.

Thanks!

hi @apparentlymart

If you associate an existing resource with a different provider configuration then on the next plan Terraform will ask the new provider confirmation to “refresh” the resource instances. If the new provider configuration reports that the object exists then Terraform should just plan changes against it as normal, without proposing to replace it.

Not sure if I understand correctly, but your talking about switching provider instance in Terraform configuration, right ? What I was talking about isn’t quite that.

I wish to be able to provision a VM on a cluster (which is, in the case of libvirt, modeled as a set of different instances of the same provider because there is no cluster management in this provider). And in the cluster lifetime, it happens sometimes that VM distributions are (automatically or not) re-balanced, for example for an hypervisor maintenance, and that without Terraform. I wish that Terraform don’t care about that (except maybe if specific instruction is given in the ressource configuration) and let that untouched on next apply, not even with a change, it should appear as untouched.

I think this feature could be useful to any provider that do not manage clusters/multi-zones.

I think a bit of practical demonstration should help. Let’s say I have 2 hypervisors:

provider "libvirt" {
  alias = "host-1"
  uri = "qemu+ssh://host-1.dns/system"
}

provider "libvirt" {
  alias = "host-2"
  uri = "qemu+ssh://host-2.dns/system"
}

As far as I know, with that configuration you have to attach VMs to one of those providers, for instance host-1:

resource "libvirt_domain" "myvm" {
  provider = libvirt.host-1
  ...
}

Let’s say myvm is later re-distributed (hot migrated) to host-2 because of load balancing of some sort. After all, functionally, myvm has no need to be specially on one host or the other.

Then next time I will run Terraform, of course myvm will disappear on host-2 to be recreated on host-1.

The behaviour I’m looking for is that terraform will let that untouched. That’s why I’m looking for an equivalent to that imaginary syntax:

provider "libvirt" {
  alias = "host-1"
  uri = "qemu+ssh://host-1.dns/system"
}

provider "libvirt" {
  alias = "host-2"
  uri = "qemu+ssh://host-2.dns/system"
}

provider_group "libvirt_cluster" {
  members = [ libvirt.host-1, libvirt.host-2 ]
  # Let's say we can choose how new resources are distributed
  creation_distribution = "round-robin"
  ...
}

resource "libvirt_domain" "myvm" {
  provider = libvirt_cluster
  # if we want the resource to be specifically attached to a special host:
  provider_attached = libvirt.host-1
  ...
}

Thank you for the extra information.

The best, or possibly only, way of obtaining the behaviour you describe within the existing architecture of Terraform, would be to change the libvirt provider so that it could directly interface with a set of hypervisors, from a single provider instance.

That way the single instance of the modified libvirt provider would be free to map the concept of a Terraform resource to VMs located on any hypervisor in the set, and implement whatever libvirt-specific cluster management behaviours it needs to internally.

Indeed… also, some other systems solve this problem by offering a real feature in the remote system that’s responsible for managing a collection of VMs as a single unit where the individual VMs are an implementation detail, such as AWS EC2 Autoscaling, GCP Instance groups, etc

That sort of design typically works out better because the separation of concerns is clear: Terraform is responsible for managing the overall group, the “template” which describes what all of the VMs have in common, etc, while the remote system is responsible for the ongoing work to maintain the requested level of service, including migrating workloads into different locations when hardware fails, etc.

I assume libvirt does not have an equivalent abstraction and so you are forced to try to emulate that sort of behavior entirely within Terraform. Unfortunately Terraform isn’t really designed to play this role because doing it properly requires a long-running agent that can monitor the system state and react quickly if the situation changes, whereas Terraform is more like a batch process tool that runs only occasionally when you are intentionally changing the desired state of the system.

I expect there is a way to approximate what your described with changes to the provider itself, such as what @maxb described. But you described there being something else in your system modifying the allocation of workloads dynamically at runtime, and so I think the more promising direction would be to find some way to make that component responsible for the entire management of the individual VMs, and either skip using Terraform entirety for this part of the problem or find some way to use Terraform to manage the system that’s doing the management, similar to how Terraform can manage EC2 Autoscaling configuration without directly managing the individual VMs.