I managed to migrate a complicated terraform
project (remote backend
, s3
) to Terraform Cloud (refer to as TFC
) last week with lots of learnings.
In this post I’ll share:
- why you want to use
TFC
/ benefit ofTFC
- things to consider before committing to
TFC
- a walkthrough on how to migrate your existing terraform projects to
TFC
: see part 2
Benefits of TFC
1. backend authoritative tfstate
version
A terraform project that has a team of developers working on it but don’t have an approach to enforce the local terraform CLI version may run into a remote state being upgraded because 1 developer used a newer version. This happens very often during new hire onboarding.
The same problem can be solved by other approaches but it’s great to have this solution and don’t need to maintain any solutions on our own.
2. agent execution mode / run consistency
TFC supports configuring pools of agents to run your terraform, ensuring all the dependencies are checked in. Agent provisioning is a forcing function to explicitly call out all the previously implicit dependencies, ensuring all developers a consistent experience running terraform
for projects on TFC
3. GitOps CI/CD
You can easily configure GitHub webhook/app to use TFC as CI/CD solution for your terraform projects. This solves the problem of “what happens after my PR is merged” and increases productivity.
Apply would require human user approval, similar to how terraform
CLI will ask for human user input.
4. Up to per-workspace granularity permission control
You can easily define who has what permission to each workspace and project if you need fine granularity in permission control.
5. Shared variable set among different workspaces/projects
You no longer need to make copies of the same variable values across many projects/workspaces (e.g. a service account token). The concept of “variable set” provides reusability for variable inputs among different projects/workspaces. This reduces the number of copies of variables/secrets and makes rotation much easier.
6. Policy enforcement
TFC natively supports Sentinel and OPA(Open Policy Agent) and one can set up policies to enforce rules for terraform projects in your organization. The rules can be global or applied to a set of workspaces/projects for scope control. Runs that fail to comply with affected policies cannot be applied.
7. Run history: visibility and traceability
I’m not sure how long the retention is but runs from different triggers (CLI, PR merge, etc) are preserved on TFC for people with the right permission to view and inspect.
This added trackability can be extremely valuable in identifying what caused a mysterious issue on production.
8. Enable collaboration
The improved visibility of each run enables a team of developers to review your work / troubleshoot your issues
9. Module Registry
TFC hosts private modules and you can publish your private modules on TFC, and reference them in your project with version.
10. and more.
The above is a non-exhaustive list based on my own usage / assessment and I definitely missed some other reasons to prefer using TFC. Please a comment: let me know what you think and what I missed.
Things to consider
A couple of things I want to point out for you to consider before jumping into migrating to TFC.
1. Productivity
You can still trigger terraform plan/apply
from local CLI but it will happen on TFC. One TFC agent will pick up your run and provide an output for you. Depending on the number of workspaces, the number of agents and the concurrent human users, the delay between hitting "enter" to a terraform run is started varies between tens of seconds to maybe a few minutes.
In addition, when VCS connection is configured (for version control), terraform apply
from CLI will be rejected. You can optionally disconnect VCS connection to use TFC as a runner only to bypass this issue.
If your project is still in the stage of frequent iteration, this added overhead may be a productivity killer. When you’re just setting up something new, you may go through some refactoring and may even change which tfstate
to place things, using TFC will definitely slow you down in those cases.
My recommendations:
- stick with a remote backend (
S3
) + localterraform
runs until your project hierarchy and progression is ironed out. - use different configuration for different
workspace
s(environments) from the same project: - dev: disconnect VSC and turn on
auto apply
if it matters, sine this is the most frequently iterated workspace - staging: connect VSC (GitOps) and turn on
auto apply
- production: connect VSC (GitOps) and turn on
manual apply
Migrating to TFC isn’t a lot of work (if you follow my guide) and I recommend optimizing your setup for productivity.
2. Cost
there is no such thing as a free lunch
TFC has a free tier plan and if it fits your organization's size, great! However, for bigger organizations, you may have to pay for TFC: more on pricing
Your mileage may vary and I won’t advise on what the cost should be. Rightsize your plan commitment/usage to what you need to get a good estimate of how much you’ll pay for the benefits.
3. +1 Point of failure
TFC becomes a single point of failure. TFC SLA on availability can be found here but it won’t be 100%.
Local CLI delegates to TFC so if TFC goes down, there’s no easy way to run your terraform
if you have anything urgent. Keep on reading and I'll provide a non-trivial option to achieve this as a backdoor for admins.
Conclusion
If you find this to be helpful, give it a clap and it would mean the world to me. Please share this with whoever needs this, and I’d appreciate it if you want to buy me a coffee