We are building a SaaS product which requires each and every client to have their own resources (think database, custom application, etc.)
Right now the number of clients is small so we have a JSON file in which we add new clients and run terraform apply. Fairly straightforward.
Now in the future we would like to build the feature that clients can register via the website. and when they do the new resources should be created. We are looking for the best practices in that situation.
Right now we are thinking about a central database that will contain the client information and every time a client registers we add a row to a table and trigger a CI pipeline. Where the terraform script itself will query this table and update the deployment. I like that this will be easy to implement, but we will lose the full “infrastructure as code” experience as some of the state will be kept in that database.
Another way would be to have a script that modifies this JSON file in git every time a new client registers and commit it to the repository. Having a CI pipeline to apply those changes. This feels a little awkward but more in line with having all infra in code.
I’m looking for some best practices for this case, any thoughts ?
Terraform does not scale well when used to manage a single large configuration containing large numbers of resources.
Also, running Terraform runs that can potentially affect your entire client base is risky in case of a bug or unexpected code change simultaneously breaking everyone’s service.
Also, the configuration styles you discussed will make it hard for you to switch new clients to a new version of your infrastructure definition, whilst leaving existing client infrastructure as is, or performing gradual upgrades.
For all these reasons you should have an entirely separate Terraform state file for each client, meaning each Terraform run only executes on one client’s resources.
This does imply you will need to build custom orchestration tools to set up and manage all these Terraform runs, and store their states. Yes, that is a significant investment, but it’s a necessary one to run a business at the volume of clients that self-registration implies.
As for the extent to which you use a database as a database vs using Git as a database … surely you need a proper database of clients anyway… you need to be able to store the fact a client has requested creation of resources, acknowledge it to them, and start up provisioning in the background - and be prepared to elegantly handle failures (e.g. cloud service outage or insufficient quota) which may require alerting an internal engineer and reprovisioning the client resources later.
You should not treat ‘the full “infrastructure as code” experience’ as axiomatically good. Sometimes it is not the right tool for the job. Even HashiCorp themselves, in Terraform Cloud/Enterprise, support running one set of Terraform code against multiple variable sets with multiple stored states.