Hello there,
I have a problem and I couldn’t find a solution. I did a lot of experimenting and found a lot of information. But I feel that I am going blind, so I wanted to consult you.
I use Google cloud, I manage my resources completely with terraform. I have added my terraform configuration below.
I want to do;
1- creating my disks and server (that’s ok),
2- running my commands with “remote-exec” after my resources are created (this is ok)
3- being able to automatically repeat the same operations in a disaster scenario (there is a problem with this)
My problem is:
When I follow the instructions below for the first time, I don’t have a problem, a clean and beautiful installation is done. But when I try to delete the instance and boot disk and add them again, my “remote-exec” commands in “null_resource” do not work again.
My goal is to implement the disaster scenario. I add the boot and data disks to my server separately. Linux and application services are on the boot disk, and the data of the applications are on the data disk. I keep the disks so that they are not deleted while the instance is being deleted. In any case my disks remain stable.
I don’t have any problems so far.
But in case of a possible problem scenario, when I have a problem with linux or the application, I want to run the terraform code again. When I run the terraform codes again, while the boot disk is being reinstalled, I want my “remote-exec” commands in “null_resource” to run automatically again.
I’m testing by manually deleting my instance and boot disk via google cloud console. After it is deleted, I check with “terraform plan” and see that it can install the boot disk and instance again without any problems. But it doesn’t run the “null_resource” resource and my “remote-exec” commands in it again. As such, I cannot automatically configure my linux server and install my applications.
I found and tried many resources on the use of null_resource on google. But I did not get successful results. I guess either I haven’t learned to use this resource or I’m confused.
If I run the terraform code again when my boot disk is deleted, I want it to run again in my null_resource remote-exec commands, together with the boot disk and instance. If the boot disk has not been reinstalled, keep it as it first worked. If the boot disk has not been reinstalled, if I add a different resource on the terraform or change the instance resources, I want the null_resource to remain as it was originally and not reinstall.
Normally, I set up 3 servers with “count” and for example, these servers are elasticsearch clusters. But now I reduced the count to 1 for testing. After seeing that I can do this on 1 server, I will increase the count to 3 and my other servers…
Sorry for explaining my problem a little long. I’m open to suggestions on how to do this.
variable "regions" {
default = "europe-north1"
}
variable "zones" {
default = ["europe-north1-a", "europe-north1-b", "europe-north1-c"]
}
variable "instance_name" {
default = {
"xxxapplication" = "example-apps-server"
}
}
variable "instance_count" {
default = {
"xxxapplication" = "1"
}
}
variable "internal_ip_pools" {
default = ["192.168.1.25", "192.168.1.26", "192.168.1.27"]
}
### Boot Disk
resource "google_compute_disk" "my_test_servers_instance_boot_disk" {
name = "${var.instance_name["xxxapplication"]}${format("%02d", count.index+1)}-boot-disk"
zone = var.zones[count.index % length(var.zones)]
image = "debian-latest"
count = var.instance_count["xxxapplication"]
type = "pd-ssd"
size = "50"
}
### Data Disk
resource "google_compute_disk" "my_test_servers_instance_data_disk" {
depends_on = [google_compute_instance.my_test_servers_instance]
name = "${var.instance_name["xxxapplication"]}${format("%02d", count.index+1)}-data-disk"
zone = var.zones[count.index % length(var.zones)]
count = var.instance_count["xxxapplication"]
type = "pd-ssd"
size = "100"
}
### Attach Disk
resource "google_compute_attached_disk" "my_test_servers_instance_attach_data_disk" {
depends_on = [google_compute_disk.my_test_servers_instance_data_disk]
mode = "READ_WRITE"
zone = var.zones[count.index % length(var.zones)]
count = var.instance_count["xxxapplication"]
disk = element(google_compute_disk.my_test_servers_instance_data_disk.*.name, count.index)
instance = element(google_compute_instance.my_test_servers_instance.*.self_link, count.index)
}
resource "google_compute_instance" "my_test_servers_instance" {
depends_on = [google_compute_disk.my_test_servers_instance_boot_disk]
name = "${var.instance_name["xxxapplication"]}${format("%02d", count.index+1)}"
hostname = "${var.instance_name["xxxapplication"]}${format("%02d", count.index+1)}.odeeontechnology.com"
machine_type = var.gcp_instance_size["4cpu-22mem"]
zone = var.zones[count.index % length(var.zones)]
count = var.instance_count["xxxapplication"]
tags = ["ssh-access"]
boot_disk {
auto_delete = false
source = element(google_compute_disk.my_test_servers_instance_boot_disk.*.name, count.index)
}
lifecycle {
ignore_changes = [attached_disk]
}
metadata = {
serial-port-enable = "true"
}
network_interface {
network_ip = var.internal_ip_pools[count.index % length(var.internal_ip_pools)]
network = google_compute_network.master.self_link
subnetwork = google_compute_subnetwork.secondary.name
access_config {
nat_ip = element(google_compute_address.my_test_servers_public_ip.*.address, count.index)
}
}
}
resource "null_resource" "run_commands" {
count = var.instance_count["xxxapplication"]
depends_on = [google_compute_attached_disk.my_test_servers_instance_attach_data_disk]
triggers = {
disklist = element(google_compute_disk.my_test_servers_instance_boot_disk.*.name, count.index)
}
provisioner "remote-exec" {
on_failure = continue
connection {
type = "ssh"
user = "testuser"
host = element(google_compute_address.my_test_servers_public_ip.*.address, count.index)
agent = false
password = "1234555"
}
inline = [
"sudo ls -lah /etc >> /tmp/list.txt "
...
]
}
}