Is anyone running MPI jobs (OpenMPI etc) on Nomad, and if so how do you accommodate MPI’s prescriptive model of how the tasks within the job must be initialized?
MPI is picking up a bit of interest outside of the HPC and super computer communities now thanks to it’s use for scale-out training of neural networks with Tensorflow. The Kubeflow project introduced an MPI Operator to abstract away all the MPI fun and games. Where’s our Nomad equivalent?
One of my colleagues has developed a solution that is sufficient for our current needs (not neural net training). It there is interest in generalizing this we might be able to seed an open source project with it. Other folks would have to lead the work to adapt it for their use cases, but hopefully we can end up with something widely useable.