Lifecycle of kubernetes_job with ttl_seconds_after_finished = 0

MateusDadalto · March 21, 2023, 11:01pm

Hello terraform comunity!

I have the following setup:
A kubernetes_job that runs some commands against my database to setup some configurations when the db is created. I did not want the job to be around in the cluster after finished so I’ve made its spec.ttl_seconds_after_finished = 0.

The problem is that even with the lifecycle set to replace_triggered_by = [mydatabase.id] everytime I run a terraform plan it creates and runs the job again. I think this is because it is checking the cluster and since ttl_seconds_after_finished = 0, there is no job there. Is there a way to prevent terraform from creating this job at all?

I know that I could use the count and set up a variable to true or false but I wanted something more automatic that could interact with the database lifecycle.

Here is approximately the kubernetes_job resource that I’m using. I’ve replaced the sensitive names with mocking names.

resource "kubernetes_job" "setup_db" {
  lifecycle {
    replace_triggered_by = [
      mydatabase.id,
    ]
  }
  metadata {
    name = "setup-db"
  }
  spec {
    ttl_seconds_after_finished = 0
    template {
      metadata {}
      spec {
        container {
          name    = "setup-db"
          image   = "mycustomimage"
          command = ["/bin/sh"]
          args = [
            "-c",
            "mysql -h mydb -u myuser -pmypassword < mycommands.sh",
          ]
        }
        restart_policy = "Never"
      }
    }
    backoff_limit = 0
  }
  wait_for_completion = false
}

macmiranda · March 22, 2023, 3:05pm

Hi @MateusDadalto,

Not sure how to achieve what you want using the kubernetes_job resource. Do you know how the provider tracks the job lifecycle in this case, i.e. had you set wait_for_completion to true, would the provider consider the job finished once and not try to recreate it (even though it doesn’t exist as an API object anymore)?

I don’t know the answer but that seems like something the provider could do (there are many reasons why one would want to remove finished job objects from the API, in which case, a lot of users of that provider would have the same problem).

In any case, we chose to use Lambda functions for our DB initializations. Maybe that’s something you can consider as well. Any kind of supported serverless function would do (doesn’t need to be AWS Lambda) as long as the Terraform provider has an invocation resource and guarantees that it will only run once (there is a workaround if you want to force a re-run, but generally speaking, it only runs once).

MateusDadalto · March 22, 2023, 4:07pm

Hi @macmiranda,

I don’t know so much about the provider code. I looked briefly at the provider’s code but my lack of GO knowledge didn’t help at that . I tried setting wait_for_completion to true and it did not work.

It is really intriguing that no one complained about this error already. I set up a local project (code below) to check if it was not something else I was doing and it seems that this is indeed the behavior.

The suggestion about lambdas is really good, I’ve been messing with k8s for so long that I’ve forgotten that they exist . Thanks

  terraform {
    required_providers {
      kubernetes = {
        source  = "hashicorp/kubernetes"
        version = ">= 2.16.0"
      }
    }

    required_version = ">= 1.1.0"
  }

  provider "kubernetes" {
    config_path = "~/.kube/config"
  }


  variable "test" {
    type = number
    default = 1
  }

  resource "terraform_data" "kubernetes_job_control" {
    input = var.test
  }

  resource "kubernetes_job" "mock_job" {
    lifecycle {
      replace_triggered_by = [
        terraform_data.kubernetes_job_control
      ]
    }
    metadata {
      name = "mock-job"
    }
    spec {
      ttl_seconds_after_finished = 1
      template {
        metadata {}
        spec {
          container {
            name    = "mock-job"
            image   = "hello-world"
          }
          restart_policy = "Never"
        }
      }
      backoff_limit = 0
    }
    wait_for_completion = true
  }