Stop operation / graceful shutdown is not working as expected

When terraform apply is interrupted using ^ + C , I expect it to stop the operation… I had to interrupt once again to stop the operation…

This is intermittent… Some times one interrupt is enough to stop operation… While, the other time it takes more than one…

Is this expected? Why and when do we actually have to interrupt more than once to stop the operation?

module.vpc_ocp_cluster.ibm_container_vpc_cluster.cluster: Still creating... [1h22m52s elapsed]
module.vpc_ocp_cluster.ibm_container_vpc_cluster.cluster: Still creating... [1h23m2s elapsed]
module.vpc_ocp_cluster.ibm_container_vpc_cluster.cluster: Still creating... [1h23m12s elapsed]
module.vpc_ocp_cluster.ibm_container_vpc_cluster.cluster: Still creating... [1h23m22s elapsed]
module.vpc_ocp_cluster.ibm_container_vpc_cluster.cluster: Still creating... [1h23m32s elapsed]
module.vpc_ocp_cluster.ibm_container_vpc_cluster.cluster: Still creating... [1h23m42s elapsed]
module.vpc_ocp_cluster.ibm_container_vpc_cluster.cluster: Still creating... [1h23m52s elapsed]
module.vpc_ocp_cluster.ibm_container_vpc_cluster.cluster: Still creating... [1h24m2s elapsed]
^CStopping operation...

Interrupt received.
Please wait for Terraform to exit or data loss may occur.
Gracefully shutting down...

module.vpc_ocp_cluster.ibm_container_vpc_cluster.cluster: Still creating... [1h24m12s elapsed]
module.vpc_ocp_cluster.ibm_container_vpc_cluster.cluster: Still creating... [1h24m22s elapsed]
module.vpc_ocp_cluster.ibm_container_vpc_cluster.cluster: Still creating... [1h24m32s elapsed]
module.vpc_ocp_cluster.ibm_container_vpc_cluster.cluster: Still creating... [1h24m42s elapsed]
module.vpc_ocp_cluster.ibm_container_vpc_cluster.cluster: Still creating... [1h24m52s elapsed]
module.vpc_ocp_cluster.ibm_container_vpc_cluster.cluster: Still creating... [1h25m2s elapsed]
module.vpc_ocp_cluster.ibm_container_vpc_cluster.cluster: Still creating... [1h25m12s elapsed]
module.vpc_ocp_cluster.ibm_container_vpc_cluster.cluster: Still creating... [1h25m22s elapsed]
module.vpc_ocp_cluster.ibm_container_vpc_cluster.cluster: Still creating... [1h25m32s elapsed]
^C
Two interrupts received. Exiting immediately. Note that data loss may have occurred.

╷
│ Error: operation canceled
│ 

terraform stopped with one interrupt

null_resource.sleep: Still creating... [10s elapsed]
null_resource.sleep: Still creating... [20s elapsed]
^CInterrupt received.
Please wait for Terraform to exit or data loss may occur.
2021/08/04 20:48:40 [WARN] terraform: Stop called, initiating interrupt sequence
2021/08/04 20:48:40 [WARN] terraform: run context exists, stopping
2021/08/04 20:48:40 [INFO] terraform: waiting for graceful stop to complete
Gracefully shutting down...
Stopping operation...

2021/08/04 20:48:40 [WARN] Early exit triggered by hook: *terraform.stopHook
2021/08/04 20:48:40 [WARN] Errors while provisioning null_resource.sleep with "local-exec", so aborting
2021/08/04 20:48:40 [WARN] Early exit triggered by hook: *terraform.stopHook
2021-08-04T20:48:40.133+0530 [WARN]  plugin.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = transport is closing"
2021-08-04T20:48:40.133+0530 [WARN]  plugin.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = transport is closing"
2021-08-04T20:48:40.135+0530 [DEBUG] plugin: plugin process exited: path=/Users/kavya/go/src/github.com/IBM-Cloud/terraform-provider-ibm/examples/aatest/rg/data/.terraform/plugins/darwin_amd64/terraform-provider-null_v3.1.0_x5 pid=12302
2021-08-04T20:48:40.135+0530 [DEBUG] plugin: plugin exited
2021-08-04T20:48:40.137+0530 [DEBUG] plugin: plugin process exited: path=/usr/local/bin/terraform12 pid=12300
2021-08-04T20:48:40.137+0530 [DEBUG] plugin: plugin exited
2021/08/04 20:48:40 [WARN] terraform: stop complete

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

Ctrl-C should tell the provider it needs to stop what it’s doing after just one invocation, but it depends on the provider actually listening to that signal and stopping what it’s doing. It looks like, in this case, the provider carried on trying to create the resource instead of aborting like it’s supposed to.

@paddy,
Yeah… That is what I m wondering…
Is there anything that has to be done from the provider side…?
I was in an assumption that plugin-SDK will take care of this…

Thanks…

The provider should do a few things:

  1. Make sure to thread the context.Context through to any calls that accept a context.Context. This lets anything you’re calling know when to stop working.
  2. If you’re polling (it seems like you are?) periodically check Context.Done() or Context.Err() to see if the context is still open. If it’s not, stop polling and return.
  3. Before making expensive/long calls, you can check in with Context.Done() or Context.Err(), but that is probably overkill in most situations.

Unfortunately, the SDK has no way to wrest flow control back from the provider and force it to return early; it can only provide the tools for the provider to know when it should yield back to the SDK, and hope the provider will be well-behaved.

1 Like