Within group communication not working / best networking pattern for this use case

bgalvao · November 28, 2021, 2:29pm

I have two tasks running in the same group. One is the front end (mlflow ui) and the other is the backend (mlflow db).

Never mind the commented “–backend-store-uri”: I get into a bash shell of the “mlflow” task and try to connect to the database using the $NOMAD_ADDR_mlflow_db variable:

root@5ce82f892731:/# mlflow server --default-artifact-root /home --backend-store-uri postgresql://$NOMAD_ADDR_mlflow_db
2021/11/28 12:16:47 WARNING mlflow.store.db.utils: SQLAlchemy engine could not be created. The following exception is caught.
(psycopg2.OperationalError) connection to server at "127.0.0.1", port 24788 failed: Connection refused
        Is the server running on that host and accepting TCP/IP connections?

(Background on this error at: https://sqlalche.me/e/14/e3q8)

I checked that the psql server is running, but why isn’t it accepting TCP connections?

As I am a noob, I’d like to ask if this is even the best way to do this, or whether I should split the frontend and the backend into different groups.

mlflow.hcp

job "mlflow-test" {
  datacenters = ["dc1"]  # default when running dev mode
  type = "service"

  group "mlflow_group" {
    
    count = 1

    network {
      port "mlflow_ui" {}
      port "mlflow_db" {}
    }

    task "mlflow" {
      driver = "docker"
      config {
        image = "bgalvao/nomad-mlflow"
        ports = ["mlflow_ui"]
        # entrypoint = ["bash"]
        entrypoint = ["mlflow", "server"]
        args = [
          "--host", "0.0.0.0",
          "-p", "${NOMAD_PORT_mlflow_ui}",
        #   # "--backend-store-uri", "postgresql://postgres@${NOMAD_ADDR_mlflow_db}/postgres"
        ]
      }
      resources {
        cpu = 2000
        memory = 2000
      }

    }

    task "mlflow-db" {
      driver = "docker"
      config {
        image = "postgres"  # https://hub.docker.com/_/postgres
        ports = ["mlflow_db"]
      }
      env {
        POSTGRES_PASSWORD = "use_vault"
        # psql --username=spec_user -d mlflow_db
        # postgresql://spec_user:use_vault@localhost:${NOMAD_PORT_mlflow_db}/mlflow_db
        # for debugging purposes
      }
      resources {
        cpu = 2000
        memory = 2000
      }
      lifecycle {
        hook = "prestart"
        # set sidecar = true
        # if you want the job to run for the duration of
        # the allocation
        sidecar = true
      }
    }
  }
}

Topic		Replies	Views
How to connect two Docker containers in two different jobs? Nomad	11	950	May 3, 2023
Nomad 101 : difficulty understanding networking between two tasks in one group Nomad	3	401	July 7, 2023
Nomad 101 : difficulty understanding nomad service between two groups Nomad	3	355	February 16, 2023
How to connect to database from outside Nomad	1	53	August 27, 2024
Tutorials/Getting started => Internal Server Error Nomad	2	393	May 27, 2023

Within group communication not working / best networking pattern for this use case

Related topics