Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add replace-provider capability #145

Merged
merged 11 commits into from
Sep 4, 2023
Merged

add replace-provider capability #145

merged 11 commits into from
Sep 4, 2023

Conversation

mdb
Copy link
Contributor

@mdb mdb commented Aug 10, 2023

This addresses issue #144 by providing a replace-provider action.

Note that, unlike other state actions, the replace-provider action does not
support a -state-out:

terraform state replace-provider \
  -state=terraform.tfstate \
  -state-out=out.tfstate \
  registry.terraform.io/hashicorp/local registy.mdb.io/hashicorp/local

Error parsing command-line flags: flag provided but not defined:
-state-out

@mdb mdb changed the title WIP: add replace-provider capability add replace-provider capability Aug 15, 2023
@mdb mdb marked this pull request as ready for review August 15, 2023 10:56

if !strings.Contains(planErr.Error(), expected) {
t.Fatalf("expected migrator plan error to include: %s; received: %s", expected, err)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

☝️ There may be a better way to author this test. In this case, because we've replaced the legitimate registry.terraform.io/hashicorp/local provider with an illegitimate registry.tfmigrate.io/hashicorp/local provider, plan is expected to error, but we can verify that it does so with the expected illegitimate provider replacement, in effect validating that replace-provider did its job.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, this test is quite tricky. I'll investigate if there is any way to test for a happy path.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find the best solution. Let me share possible options:

(1) Use a registry proxy such as https://github.com/jasonwbarnett/terraform-registry-proxy
Pros: A general solution for all providers
Cons: Require additional setup for testing

(2) Use any untrusted fork of hashicorp/null and rename to it
Pros: Easy to use
Cons: Potential supply chain attack risk

(3) Fork hashicorp/null by myself
Pros: Trust to use
Cons: Publishing a forked provider to the Terraform Registry takes a lot of work. I don't have experience in developing providers.

(4) Use a real-world example of renaming a namespace for verified providers such as hashicorp/github to integrations/github
Pros: Trust to use
Cons: Need a valid credential
The data source did not result in a plan diff even with namespace renaming, so a resource is needed. However, we do not want to create a real resource for testing. I tried to import a public repository, but it requires a valid GitHub token even a public one.

(5) Embed a legacy v0.12.x tfstate example as a test fixture and force push to remote in advance
Pros: No additional dependency
Cons: As I mentioned in #144, the test may break in the future because the behavior is undefined when an old state file format is loaded with a new version of terraform state commands.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a reasonable compromise, I am leaning towards option (5) and am trying it out, but when I load a legacy tfstate created in v0.12.31 with Terraform v1.5.5, the terraform init fails.
I'm wondering if I need to run state replace-provider before terraform init. Does it work as intended in your upgrade scenario?

a05728d678f2:/work/tmp/pr-145/dir5# cat main.tf
resource "null_resource" "foo" {}
a05728d678f2:/work/tmp/pr-145/dir5# terraform -v
Terraform v1.5.5
on linux_arm64

Your version of Terraform is out of date! The latest version
is 1.5.6. You can update by downloading from https://www.terraform.io/downloads.html

a05728d678f2:/work/tmp/pr-145/dir5# terraform init && terraform apply -auto-approve

a05728d678f2:/work/tmp/pr-145/dir5# terraform state list
null_resource.foo

a05728d678f2:/work/tmp/pr-145/dir5# terraform providers

Providers required by configuration:
.
└── provider[registry.terraform.io/hashicorp/null]

Providers required by state:

    provider[registry.terraform.io/hashicorp/null]
a05728d678f2:/work/tmp/pr-145/dir5# cat legacy.tfstate
{
  "version": 4,
  "terraform_version": "0.12.31",
  "serial": 1,
  "lineage": "3cb3461c-0eab-a7b2-e2e0-57a93c3cc2a1",
  "outputs": {},
  "resources": [
    {
      "mode": "managed",
      "type": "null_resource",
      "name": "foo",
      "provider": "provider.null",
      "instances": [
        {
          "schema_version": 0,
          "attributes": {
            "id": "4107759474886503934",
            "triggers": null
          }
        }
      ]
    }
  ]
}

a05728d678f2:/work/tmp/pr-145/dir5# terraform state push -force legacy.tfstate

a05728d678f2:/work/tmp/pr-145/dir5# terraform providers

Providers required by configuration:
.
└── provider[registry.terraform.io/hashicorp/null]

Providers required by state:

    provider[registry.terraform.io/-/null]

a05728d678f2:/work/tmp/pr-145/dir5# terraform init

Initializing the backend...
╷
│ Error: Invalid legacy provider address
│
│ This configuration or its associated state refers to the unqualified provider "null".
│
│ You must complete the Terraform 0.13 upgrade process before upgrading to later versions.
╵
a05728d678f2:/work/tmp/pr-145/dir5# terraform state replace-provider registry.terraform.io/-/null registry.terraform.io/hashicorp/null
Terraform will perform the following actions:

  ~ Updating provider:
    - registry.terraform.io/-/null
    + registry.terraform.io/hashicorp/null

Changing 1 resources:

  null_resource.foo

Do you want to make these changes?
Only 'yes' will be accepted to continue.

Enter a value: yes

Successfully replaced provider for 1 resources.

a05728d678f2:/work/tmp/pr-145/dir5# terraform providers

Providers required by configuration:
.
└── provider[registry.terraform.io/hashicorp/null]

Providers required by state:

    provider[registry.terraform.io/hashicorp/null]

a05728d678f2:/work/tmp/pr-145/dir5# terraform init

Initializing the backend...

Initializing provider plugins...
- Reusing previous version of hashicorp/null from the dependency lock file
- Using previously-installed hashicorp/null v3.2.1

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

a05728d678f2:/work/tmp/pr-145/dir5# terraform plan
null_resource.foo: Refreshing state... [id=4107759474886503934]

No changes. Your infrastructure matches the configuration.

Terraform has compared your real infrastructure against your configuration and found no differences,
so no changes are needed.

Copy link
Contributor Author

@mdb mdb Aug 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if I need to run state replace-provider before terraform init. Does it work as intended in your upgrade scenario?

I don't believe you can run terraform state replace-provider before first running terraform init. Otherwise, you'll run into an Error loading the state error:

terraform state replace-provider registry.terraform.io/-/null registry.terraform.io/hashicorp/null
Error loading the state: Backend initialization required, please run "terraform init": Reason: Initial configuration of th
e requested backend "s3"

In my experience, when using a Terraform version >= v0.13.0 to run terraform state replace-provider against a legacy v0.12.31 tfstate, it's necessary to...

  1. First run terraform init; this will fail with the Error: Invalid legacy provider address error you mention above. Despite the failure, it's still necessary.
  2. Run terraform state replace-provider

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest the acceptance test confirm this upgrade scenario by embedding the legacy v0.12.x tfstate as a test fixture and force push to remote in advance.

@minamijoyo Thanks for the suggestion. I gave this a shot (see 4ecad8d). I have mixed feelings about some aspects of the implementation, so please let me know if you have other concerns and feedback.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we need to fix OverrideBackendToLocal as well as setupWorkDir? The error occurs during rollback but does not throw the error. It works in unintended balance, but it is not so user-friendly.

6d3015a26631:/work/tmp/pr-145/dir11# cat test.hcl
migration "state" "test" {
  dir = "."
  actions = [
    "replace-provider registry.terraform.io/-/null registry.terraform.io/hashicorp/null",
  ]
}

6d3015a26631:/work/tmp/pr-145/dir11# tfmigrate plan test.hcl
2023/08/28 14:52:35 [INFO] [runner] load migration file: test.hcl
2023/08/28 14:52:35 [INFO] [migrator] start state migrator plan
2023/08/28 14:52:35 [INFO] [migrator@.] terraform version: 1.5.5
2023/08/28 14:52:35 [INFO] [migrator@.] initialize work dir
2023/08/28 14:52:35 [INFO] [migrator@.] ignoring error 'Error: Invalid legacy provider address' initilizing work dir; the error is expected when using Terraform >= 0.13 with a legacy Terraform state
2023/08/28 14:52:35 [INFO] [migrator@.] get the current remote state
2023/08/28 14:52:35 [INFO] [migrator@.] override backend to local
2023/08/28 14:52:35 [INFO] [executor@.] create an override file
2023/08/28 14:52:35 [INFO] [migrator@.] creating local workspace folder in: terraform.tfstate.d/default
2023/08/28 14:52:35 [INFO] [executor@.] switch backend to local
2023/08/28 14:52:35 [INFO] [migrator@.] compute a new state
2023/08/28 14:52:36 [INFO] [migrator@.] check diffs
2023/08/28 14:52:36 [INFO] [executor@.] remove the override file
2023/08/28 14:52:36 [INFO] [executor@.] remove the workspace state folder
2023/08/28 14:52:36 [INFO] [executor@.] switch back to remote
2023/08/28 14:52:36 [ERROR] [executor@.] failed to switch back to remote: failed to run command (exited 1): terraform init -input=false -no-color -reconfigure
stdout:

Initializing the backend...

Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.

stderr:

Error: Invalid legacy provider address

This configuration or its associated state refers to the unqualified provider
"null".

You must complete the Terraform 0.13 upgrade process before upgrading to
later versions.


2023/08/28 14:52:36 [ERROR] [executor@.] please re-run terraform init -reconfigure
2023/08/28 14:52:36 [INFO] [migrator] state migrator plan success!

6d3015a26631:/work/tmp/pr-145/dir11# echo $?
0

Copy link
Owner

@minamijoyo minamijoyo Aug 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intuition was that the error would also occur when switching the backend to local, but I wonder why it doesn't.
Switching to local with a direct aws s3 cp of a v0.12.31 tfstate doesn't cause an error, but with terraform v1.5.5, After state push, switching to local also gives an error.
Rollback to remote throws the error in both cases. I don't know why 🤷‍♂️

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, it is better to fix the OverrideBackendToLocal function to suppress the legacy state error in both the switch and rollback phases, isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Good call! I took a stab at implementing your suggestion in 37e2834. How's that look?

This addresses issue #144 by providing a `replace-provider` action.

Note that, unlike other `state` actions, the `replace-provider` action does not
support a `-state-out`:

```
terraform state replace-provider \
  -state=terraform.tfstate \
  -state-out=out.tfstate \
  registry.terraform.io/hashicorp/local registy.mdb.io/hashicorp/local

Error parsing command-line flags: flag provided but not defined:
-state-out
```

Signed-off-by: Mike Ball <[email protected]>
@mdb
Copy link
Contributor Author

mdb commented Aug 17, 2023

@minamijoyo Addressing issue #145 and adding replace-provider functionality turned out to be a bit more quirky than I anticipated, but this first pass implementation is ready for your review. Thanks!

@minamijoyo
Copy link
Owner

@mdb Thanks for working on this! As you said, it's more complicated than I expected, and there are some points where I need to check the Terraform CLI behavior, so please give me some time to review it. Instead of returning the review results to you in a summary, I will comment on them in order as I study the areas in my spare time.

This corrects a typo, ensuring that StateReplaceProviderAction
implements StateAction.
StateReplaceProviderAction#StateUpdate now correctly documents that the
method updates a provider address in TF state.
This seeks to add additional unit tests surrounding scenarios where...

* `terraform version` succeeds, but returns an unsupported version
* `terraform version` exits nonzero
@mdb
Copy link
Contributor Author

mdb commented Aug 24, 2023

@mdb Thanks for working on this! As you said, it's more complicated than I expected, and there are some points where I need to check the Terraform CLI behavior, so please give me some time to review it. Instead of returning the review results to you in a summary, I will comment on them in order as I study the areas in my spare time.

Thanks @minamijoyo - I believe I've addressed the concerns you've expressed so far. I'll stay tuned for any further feedback!

* use force-pushed TF 0.12.31 state file in `state replace-provider`
  acceptance tests, [as suggested in code review](#145 (comment))
* allow `terraform init` to fail -- as expected -- in instances where a
  non-legacy Terraform is used against a legacy TF state for the
  purposes of `state replace-provider`-ing
Makefile Show resolved Hide resolved
docker-compose.yml Show resolved Hide resolved
return nil, nil, err
acceptable := "Error: Invalid legacy provider address"
if supportsStateReplaceProvider && strings.Contains(err.Error(), acceptable) {
log.Printf("[INFO] [migrator@%s] ignoring error '%s' initilizing work dir; the error is expected when using Terraform %s with a legacy Terraform state\n", tf.Dir(), acceptable, constraints)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

☝️ Could/should this logic be improved? For example, perhaps the Error: Invalid legacy provider address should only be permitted during state replace-provider actions, but not other actions? Doing so would require passing the action(s) to setupWorkDir, though. Do you have thoughts, @minamijoyo ?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is safer to narrow down the special error-handling conditions.
Rather than passing actions directly to the setupWorkDir function, why not pass it as a bool argument, such as ignoreLegacyState?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@minamijoyo Yep; we can pass a bool flag if you prefer. I opted not to, as that arguably redundantly distributes the need to correctly set this bool value across all invocations of the function (rather than consolidated the logic to a single place, within the function). However, perhaps you're right that doing so is best in this case. Stay tuned :)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If logic is duplicated, we can define some helper functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@minamijoyo I believe I've implemented your suggestion in 67ed675. Does that seem acceptable?

allow `terraform init` to fail -- as expected -- in instances where a
non-legacy Terraform is used against a legacy TF state for the
purposes of `state replace-provider`-ing _also_ when invoking `terraform
init` from within `OverrideBackendToLocal`.

This addresses the _second_ `init` error seen in the logs:

```
2023/08/28 14:52:35 [INFO] [runner] load migration file: test.hcl
2023/08/28 14:52:35 [INFO] [migrator] start state migrator plan
2023/08/28 14:52:35 [INFO] [migrator@.] terraform version: 1.5.5
2023/08/28 14:52:35 [INFO] [migrator@.] initialize work dir
2023/08/28 14:52:35 [INFO] [migrator@.] ignoring error 'Error: Invalid legacy provider address' initilizing work dir; the error is expected when using Terraform >= 0.13 with a legacy Terraform state
2023/08/28 14:52:35 [INFO] [migrator@.] get the current remote state
2023/08/28 14:52:35 [INFO] [migrator@.] override backend to local
2023/08/28 14:52:35 [INFO] [executor@.] create an override file
2023/08/28 14:52:35 [INFO] [migrator@.] creating local workspace folder in: terraform.tfstate.d/default
2023/08/28 14:52:35 [INFO] [executor@.] switch backend to local
2023/08/28 14:52:35 [INFO] [migrator@.] compute a new state
2023/08/28 14:52:36 [INFO] [migrator@.] check diffs
2023/08/28 14:52:36 [INFO] [executor@.] remove the override file
2023/08/28 14:52:36 [INFO] [executor@.] remove the workspace state folder
2023/08/28 14:52:36 [INFO] [executor@.] switch back to remote
2023/08/28 14:52:36 [ERROR] [executor@.] failed to switch back to remote: failed to run command (exited 1): terraform init -input=false -no-color -reconfigure
stdout:

Initializing the backend...

Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.

stderr:

Error: Invalid legacy provider address

This configuration or its associated state refers to the unqualified provider
"null".

You must complete the Terraform 0.13 upgrade process before upgrading to
later versions.

2023/08/28 14:52:36 [ERROR] [executor@.] please re-run terraform init -reconfigure
2023/08/28 14:52:36 [INFO] [migrator] state migrator plan success!
```

This seeks to address feedback from @minamijoyo:
#145 (comment)

Signed-off-by: Mike Ball <[email protected]>
This seeks to further reduce the acceptance of `Error: Invalid legacy
provider address` errors to scenarios where tfmigrate is performing `state
replace-provider`.

This is inspired by code review feedback from @minamijoyo here:
#145 (comment)
@mdb mdb requested a review from minamijoyo September 2, 2023 23:49
@@ -60,7 +70,7 @@ func setupWorkDir(ctx context.Context, tf tfexec.TerraformCLI, workspace string,
}
// override backend to local
log.Printf("[INFO] [migrator@%s] override backend to local\n", tf.Dir())
switchBackToRemoteFunc, err := tf.OverrideBackendToLocal(ctx, "_tfmigrate_override.tf", workspace, isBackendTerraformCloud, backendConfig)
switchBackToRemoteFunc, err := tf.OverrideBackendToLocal(ctx, "_tfmigrate_override.tf", workspace, isBackendTerraformCloud, backendConfig, supportsStateReplaceProvider)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mdb Thank you for working on this!
I think we should pass ignoreLegacyStateInitErr instead of supportsStateReplaceProvider to OverrideBackendToLocal. Because the supportStateReplaceProvider depends on the Terraform version; on the other hand, the ignoreLegacyStateInitErr depends on actions. The ignoreLegacyStateInitErr is more strict to check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@minamijoyo That sounds reasonable - I pushed a change doing so!

Per feedback from @minamijoyo...

> Because the supportStateReplaceProvider depends on the Terraform
version; on the other hand, the ignoreLegacyStateInitErr depends on
actions. The ignoreLegacyStateInitErr is more strict to check.

#145 (comment)
Copy link
Owner

@minamijoyo minamijoyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your great work! LGTM 👍

@minamijoyo minamijoyo merged commit dfec615 into minamijoyo:master Sep 4, 2023
7 checks passed
@mdb mdb deleted the mdb/replace-provider branch September 4, 2023 13:52
@mdb
Copy link
Contributor Author

mdb commented Sep 4, 2023

@minamijoyo Thanks so much for the patient reviews and great feedback!

@minamijoyo
Copy link
Owner

@mdb Thank you for your contribution! I've shipped it in v0.3.15. 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants