Throttled CPU = Stuck Periodic Jobs

Nomad: 0.9.4

We’re using Nomad’s periodic cron job configuration for scheduling backups of our databases and using t3.medium instances in an auto-scaling group for cost and availability since we’re only backing up our databases at a random internal every hour. I noticed that the CPU credits were getting exhausted which in turn has caused our periodic jobs to hang:

We have the following config to avoid concurrent backups:

"Periodic": {
    "Enabled": true,
    "Spec": "14 * * * *",
    "SpecType": "cron",
    "ProhibitOverlap": true,
    "TimeZone": "UTC"
  },

Is there a way to enforce a timeout at the Nomad job level for periodic jobs or constrain jobs to be placed on nodes that have the CPU compute capacity?

There doesn’t seem to be a Nomad-exposed attribute for tracking compute credits per say:

Hi @mikeblum! There’s no way to specify a timeout for the job at a Nomad level but a timeout in the application container would probably do the job for your case.

There doesn’t seem to be a Nomad-exposed attribute for tracking compute credits

As far as know there isn’t a way for an EC2 instance to track its own compute credits without talking to CloudWatch directly; the timeslicing that the hypervisor gives the guest OS isn’t exposed. You may want to look into using unlimited mode or giving the workload a smaller CPU reservation so that the task doesn’t burn up all its instance’s CPU credits.

Just a side note… t3a would save some more money! :slight_smile::slight_smile::slight_smile: