-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add replace-provider capability #145
Conversation
|
||
if !strings.Contains(planErr.Error(), expected) { | ||
t.Fatalf("expected migrator plan error to include: %s; received: %s", expected, err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
☝️ There may be a better way to author this test. In this case, because we've replaced the legitimate registry.terraform.io/hashicorp/local
provider with an illegitimate registry.tfmigrate.io/hashicorp/local
provider, plan
is expected to error, but we can verify that it does so with the expected illegitimate provider replacement, in effect validating that replace-provider
did its job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, this test is quite tricky. I'll investigate if there is any way to test for a happy path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find the best solution. Let me share possible options:
(1) Use a registry proxy such as https://github.com/jasonwbarnett/terraform-registry-proxy
Pros: A general solution for all providers
Cons: Require additional setup for testing
(2) Use any untrusted fork of hashicorp/null and rename to it
Pros: Easy to use
Cons: Potential supply chain attack risk
(3) Fork hashicorp/null by myself
Pros: Trust to use
Cons: Publishing a forked provider to the Terraform Registry takes a lot of work. I don't have experience in developing providers.
(4) Use a real-world example of renaming a namespace for verified providers such as hashicorp/github to integrations/github
Pros: Trust to use
Cons: Need a valid credential
The data source did not result in a plan diff even with namespace renaming, so a resource is needed. However, we do not want to create a real resource for testing. I tried to import a public repository, but it requires a valid GitHub token even a public one.
(5) Embed a legacy v0.12.x tfstate example as a test fixture and force push to remote in advance
Pros: No additional dependency
Cons: As I mentioned in #144, the test may break in the future because the behavior is undefined when an old state file format is loaded with a new version of terraform state commands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a reasonable compromise, I am leaning towards option (5) and am trying it out, but when I load a legacy tfstate created in v0.12.31 with Terraform v1.5.5, the terraform init fails.
I'm wondering if I need to run state replace-provider before terraform init. Does it work as intended in your upgrade scenario?
a05728d678f2:/work/tmp/pr-145/dir5# cat main.tf
resource "null_resource" "foo" {}
a05728d678f2:/work/tmp/pr-145/dir5# terraform -v
Terraform v1.5.5
on linux_arm64
Your version of Terraform is out of date! The latest version
is 1.5.6. You can update by downloading from https://www.terraform.io/downloads.html
a05728d678f2:/work/tmp/pr-145/dir5# terraform init && terraform apply -auto-approve
a05728d678f2:/work/tmp/pr-145/dir5# terraform state list
null_resource.foo
a05728d678f2:/work/tmp/pr-145/dir5# terraform providers
Providers required by configuration:
.
└── provider[registry.terraform.io/hashicorp/null]
Providers required by state:
provider[registry.terraform.io/hashicorp/null]
a05728d678f2:/work/tmp/pr-145/dir5# cat legacy.tfstate
{
"version": 4,
"terraform_version": "0.12.31",
"serial": 1,
"lineage": "3cb3461c-0eab-a7b2-e2e0-57a93c3cc2a1",
"outputs": {},
"resources": [
{
"mode": "managed",
"type": "null_resource",
"name": "foo",
"provider": "provider.null",
"instances": [
{
"schema_version": 0,
"attributes": {
"id": "4107759474886503934",
"triggers": null
}
}
]
}
]
}
a05728d678f2:/work/tmp/pr-145/dir5# terraform state push -force legacy.tfstate
a05728d678f2:/work/tmp/pr-145/dir5# terraform providers
Providers required by configuration:
.
└── provider[registry.terraform.io/hashicorp/null]
Providers required by state:
provider[registry.terraform.io/-/null]
a05728d678f2:/work/tmp/pr-145/dir5# terraform init
Initializing the backend...
╷
│ Error: Invalid legacy provider address
│
│ This configuration or its associated state refers to the unqualified provider "null".
│
│ You must complete the Terraform 0.13 upgrade process before upgrading to later versions.
╵
a05728d678f2:/work/tmp/pr-145/dir5# terraform state replace-provider registry.terraform.io/-/null registry.terraform.io/hashicorp/null
Terraform will perform the following actions:
~ Updating provider:
- registry.terraform.io/-/null
+ registry.terraform.io/hashicorp/null
Changing 1 resources:
null_resource.foo
Do you want to make these changes?
Only 'yes' will be accepted to continue.
Enter a value: yes
Successfully replaced provider for 1 resources.
a05728d678f2:/work/tmp/pr-145/dir5# terraform providers
Providers required by configuration:
.
└── provider[registry.terraform.io/hashicorp/null]
Providers required by state:
provider[registry.terraform.io/hashicorp/null]
a05728d678f2:/work/tmp/pr-145/dir5# terraform init
Initializing the backend...
Initializing provider plugins...
- Reusing previous version of hashicorp/null from the dependency lock file
- Using previously-installed hashicorp/null v3.2.1
Terraform has been successfully initialized!
You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.
If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
a05728d678f2:/work/tmp/pr-145/dir5# terraform plan
null_resource.foo: Refreshing state... [id=4107759474886503934]
No changes. Your infrastructure matches the configuration.
Terraform has compared your real infrastructure against your configuration and found no differences,
so no changes are needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if I need to run state replace-provider before terraform init. Does it work as intended in your upgrade scenario?
I don't believe you can run terraform state replace-provider
before first running terraform init
. Otherwise, you'll run into an Error loading the state
error:
terraform state replace-provider registry.terraform.io/-/null registry.terraform.io/hashicorp/null
Error loading the state: Backend initialization required, please run "terraform init": Reason: Initial configuration of th
e requested backend "s3"
In my experience, when using a Terraform version >= v0.13.0 to run terraform state replace-provider
against a legacy v0.12.31 tfstate, it's necessary to...
- First run
terraform init
; this will fail with theError: Invalid legacy provider address
error you mention above. Despite the failure, it's still necessary. - Run
terraform state replace-provider
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest the acceptance test confirm this upgrade scenario by embedding the legacy v0.12.x tfstate as a test fixture and force push to remote in advance.
@minamijoyo Thanks for the suggestion. I gave this a shot (see 4ecad8d). I have mixed feelings about some aspects of the implementation, so please let me know if you have other concerns and feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we need to fix OverrideBackendToLocal as well as setupWorkDir? The error occurs during rollback but does not throw the error. It works in unintended balance, but it is not so user-friendly.
6d3015a26631:/work/tmp/pr-145/dir11# cat test.hcl
migration "state" "test" {
dir = "."
actions = [
"replace-provider registry.terraform.io/-/null registry.terraform.io/hashicorp/null",
]
}
6d3015a26631:/work/tmp/pr-145/dir11# tfmigrate plan test.hcl
2023/08/28 14:52:35 [INFO] [runner] load migration file: test.hcl
2023/08/28 14:52:35 [INFO] [migrator] start state migrator plan
2023/08/28 14:52:35 [INFO] [migrator@.] terraform version: 1.5.5
2023/08/28 14:52:35 [INFO] [migrator@.] initialize work dir
2023/08/28 14:52:35 [INFO] [migrator@.] ignoring error 'Error: Invalid legacy provider address' initilizing work dir; the error is expected when using Terraform >= 0.13 with a legacy Terraform state
2023/08/28 14:52:35 [INFO] [migrator@.] get the current remote state
2023/08/28 14:52:35 [INFO] [migrator@.] override backend to local
2023/08/28 14:52:35 [INFO] [executor@.] create an override file
2023/08/28 14:52:35 [INFO] [migrator@.] creating local workspace folder in: terraform.tfstate.d/default
2023/08/28 14:52:35 [INFO] [executor@.] switch backend to local
2023/08/28 14:52:35 [INFO] [migrator@.] compute a new state
2023/08/28 14:52:36 [INFO] [migrator@.] check diffs
2023/08/28 14:52:36 [INFO] [executor@.] remove the override file
2023/08/28 14:52:36 [INFO] [executor@.] remove the workspace state folder
2023/08/28 14:52:36 [INFO] [executor@.] switch back to remote
2023/08/28 14:52:36 [ERROR] [executor@.] failed to switch back to remote: failed to run command (exited 1): terraform init -input=false -no-color -reconfigure
stdout:
Initializing the backend...
Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.
stderr:
Error: Invalid legacy provider address
This configuration or its associated state refers to the unqualified provider
"null".
You must complete the Terraform 0.13 upgrade process before upgrading to
later versions.
2023/08/28 14:52:36 [ERROR] [executor@.] please re-run terraform init -reconfigure
2023/08/28 14:52:36 [INFO] [migrator] state migrator plan success!
6d3015a26631:/work/tmp/pr-145/dir11# echo $?
0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intuition was that the error would also occur when switching the backend to local, but I wonder why it doesn't.
Switching to local with a direct aws s3 cp of a v0.12.31 tfstate doesn't cause an error, but with terraform v1.5.5, After state push, switching to local also gives an error.
Rollback to remote throws the error in both cases. I don't know why 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, it is better to fix the OverrideBackendToLocal function to suppress the legacy state error in both the switch and rollback phases, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Good call! I took a stab at implementing your suggestion in 37e2834. How's that look?
This addresses issue #144 by providing a `replace-provider` action. Note that, unlike other `state` actions, the `replace-provider` action does not support a `-state-out`: ``` terraform state replace-provider \ -state=terraform.tfstate \ -state-out=out.tfstate \ registry.terraform.io/hashicorp/local registy.mdb.io/hashicorp/local Error parsing command-line flags: flag provided but not defined: -state-out ``` Signed-off-by: Mike Ball <[email protected]>
@minamijoyo Addressing issue #145 and adding |
@mdb Thanks for working on this! As you said, it's more complicated than I expected, and there are some points where I need to check the Terraform CLI behavior, so please give me some time to review it. Instead of returning the review results to you in a summary, I will comment on them in order as I study the areas in my spare time. |
This corrects a typo, ensuring that StateReplaceProviderAction implements StateAction.
StateReplaceProviderAction#StateUpdate now correctly documents that the method updates a provider address in TF state.
This seeks to add additional unit tests surrounding scenarios where... * `terraform version` succeeds, but returns an unsupported version * `terraform version` exits nonzero
Thanks @minamijoyo - I believe I've addressed the concerns you've expressed so far. I'll stay tuned for any further feedback! |
* use force-pushed TF 0.12.31 state file in `state replace-provider` acceptance tests, [as suggested in code review](#145 (comment)) * allow `terraform init` to fail -- as expected -- in instances where a non-legacy Terraform is used against a legacy TF state for the purposes of `state replace-provider`-ing
tfmigrate/migrator.go
Outdated
return nil, nil, err | ||
acceptable := "Error: Invalid legacy provider address" | ||
if supportsStateReplaceProvider && strings.Contains(err.Error(), acceptable) { | ||
log.Printf("[INFO] [migrator@%s] ignoring error '%s' initilizing work dir; the error is expected when using Terraform %s with a legacy Terraform state\n", tf.Dir(), acceptable, constraints) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
☝️ Could/should this logic be improved? For example, perhaps the Error: Invalid legacy provider address
should only be permitted during state replace-provider
actions, but not other actions? Doing so would require passing the action(s) to setupWorkDir
, though. Do you have thoughts, @minamijoyo ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is safer to narrow down the special error-handling conditions.
Rather than passing actions directly to the setupWorkDir function, why not pass it as a bool argument, such as ignoreLegacyState?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@minamijoyo Yep; we can pass a bool
flag if you prefer. I opted not to, as that arguably redundantly distributes the need to correctly set this bool
value across all invocations of the function (rather than consolidated the logic to a single place, within the function). However, perhaps you're right that doing so is best in this case. Stay tuned :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If logic is duplicated, we can define some helper functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@minamijoyo I believe I've implemented your suggestion in 67ed675. Does that seem acceptable?
allow `terraform init` to fail -- as expected -- in instances where a non-legacy Terraform is used against a legacy TF state for the purposes of `state replace-provider`-ing _also_ when invoking `terraform init` from within `OverrideBackendToLocal`. This addresses the _second_ `init` error seen in the logs: ``` 2023/08/28 14:52:35 [INFO] [runner] load migration file: test.hcl 2023/08/28 14:52:35 [INFO] [migrator] start state migrator plan 2023/08/28 14:52:35 [INFO] [migrator@.] terraform version: 1.5.5 2023/08/28 14:52:35 [INFO] [migrator@.] initialize work dir 2023/08/28 14:52:35 [INFO] [migrator@.] ignoring error 'Error: Invalid legacy provider address' initilizing work dir; the error is expected when using Terraform >= 0.13 with a legacy Terraform state 2023/08/28 14:52:35 [INFO] [migrator@.] get the current remote state 2023/08/28 14:52:35 [INFO] [migrator@.] override backend to local 2023/08/28 14:52:35 [INFO] [executor@.] create an override file 2023/08/28 14:52:35 [INFO] [migrator@.] creating local workspace folder in: terraform.tfstate.d/default 2023/08/28 14:52:35 [INFO] [executor@.] switch backend to local 2023/08/28 14:52:35 [INFO] [migrator@.] compute a new state 2023/08/28 14:52:36 [INFO] [migrator@.] check diffs 2023/08/28 14:52:36 [INFO] [executor@.] remove the override file 2023/08/28 14:52:36 [INFO] [executor@.] remove the workspace state folder 2023/08/28 14:52:36 [INFO] [executor@.] switch back to remote 2023/08/28 14:52:36 [ERROR] [executor@.] failed to switch back to remote: failed to run command (exited 1): terraform init -input=false -no-color -reconfigure stdout: Initializing the backend... Successfully configured the backend "s3"! Terraform will automatically use this backend unless the backend configuration changes. stderr: Error: Invalid legacy provider address This configuration or its associated state refers to the unqualified provider "null". You must complete the Terraform 0.13 upgrade process before upgrading to later versions. 2023/08/28 14:52:36 [ERROR] [executor@.] please re-run terraform init -reconfigure 2023/08/28 14:52:36 [INFO] [migrator] state migrator plan success! ``` This seeks to address feedback from @minamijoyo: #145 (comment) Signed-off-by: Mike Ball <[email protected]>
This seeks to further reduce the acceptance of `Error: Invalid legacy provider address` errors to scenarios where tfmigrate is performing `state replace-provider`. This is inspired by code review feedback from @minamijoyo here: #145 (comment)
tfmigrate/migrator.go
Outdated
@@ -60,7 +70,7 @@ func setupWorkDir(ctx context.Context, tf tfexec.TerraformCLI, workspace string, | |||
} | |||
// override backend to local | |||
log.Printf("[INFO] [migrator@%s] override backend to local\n", tf.Dir()) | |||
switchBackToRemoteFunc, err := tf.OverrideBackendToLocal(ctx, "_tfmigrate_override.tf", workspace, isBackendTerraformCloud, backendConfig) | |||
switchBackToRemoteFunc, err := tf.OverrideBackendToLocal(ctx, "_tfmigrate_override.tf", workspace, isBackendTerraformCloud, backendConfig, supportsStateReplaceProvider) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mdb Thank you for working on this!
I think we should pass ignoreLegacyStateInitErr instead of supportsStateReplaceProvider to OverrideBackendToLocal. Because the supportStateReplaceProvider depends on the Terraform version; on the other hand, the ignoreLegacyStateInitErr depends on actions. The ignoreLegacyStateInitErr is more strict to check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@minamijoyo That sounds reasonable - I pushed a change doing so!
Per feedback from @minamijoyo... > Because the supportStateReplaceProvider depends on the Terraform version; on the other hand, the ignoreLegacyStateInitErr depends on actions. The ignoreLegacyStateInitErr is more strict to check. #145 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your great work! LGTM 👍
@minamijoyo Thanks so much for the patient reviews and great feedback! |
@mdb Thank you for your contribution! I've shipped it in v0.3.15. 🚀 |
This addresses issue #144 by providing a
replace-provider
action.Note that, unlike other
state
actions, thereplace-provider
action does notsupport a
-state-out
: