Since last Wednesday, we’ve been experiencing intermittent connectivity from our Azure DevOps pipelines to registry.terraform.io. This has occurred for two different IPv4 addresses at Fastly (220.127.116.11 and 18.104.22.168), and affects (my rough estimate) about 3-5% of our connection attempts. This usually means at least 1-2 failures per run of terraform in our environment.
This manifests in our terraform runs in a few different ways:
Initializing provider plugins... - Finding microsoft/azuredevops versions matching "~> 0.1.1"... ... - Installing hashicorp/null v2.1.2... - Installing terraform-providers/bitbucket v1.2.0... - Installed terraform-providers/bitbucket v1.2.0 (signed by HashiCorp) Partner and community providers are signed by their developers. If you'd like to know more about provider signing, you can read about it here: https://www.terraform.io/docs/plugins/signing.html Error: Failed to install provider Error while installing hashicorp/null v2.1.2: Get "https://releases.hashicorp.com/terraform-provider-null/2.1.2/terraform-provider-null_2.1.2_linux_amd64.zip": dial tcp 22.214.171.124:443: i/o timeout
Error verifying checksum:
Error verifying checksum for provider "azurerm" The checksum for provider distribution from the Terraform Registry did not match the source. This may mean that the distributed files were changed after this version was released to the Registry. Error: unable to verify checksum
Registry service unreachable:
Initializing provider plugins... - Checking for available provider plugins... Registry service unreachable. This may indicate a network issue, or an issue with the requested Terraform Registry. Error: registry service is unreachable, check https://status.hashicorp.com/ for status updates
We have firewall logs and packet captures, and they’re pretty uninteresting. They show SYN retransmissions in the TCP handshake, usually for two connections in sequence to the same IP, then normal connectivity to that IP resumes. This usually occurs after several successful connections to the same IP.
We have investigated our environment thoroughly over the last few days on the assumption that it was our problem, but our firewalls are showing no signs of session exhaustion, and this doesn’t affect any other sites - our pipelines make use of Ubuntu, Docker, Kubernetes, Helm, and various container registries, including mcr.microsoft.com, all of which are working perfectly; this behaviour is only persistent on these two Fastly IPs.
I suspect either:
- Connectivity issues between Azure (Australia East region) and Fastly, or
- Rate limiting of our public IP address by Fastly and/or Hashicorp.
How can we get through to the right operational teams to get this looked at?
Thanks in advance,