Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move tenant-deletion-mark to a global dir #5676

Merged

Conversation

harry671003
Copy link
Contributor

@harry671003 harry671003 commented Nov 24, 2023

What this PR does:

  • If the tenant_cleanup_delay is large (say 24h), a deleted tenant can be continued to be synced by the store-gateways.
  • This is because the s3://<bucket>/<tenantID>/markers/tenant-deletion-mark.json will be kept for 24h after the tenant has been deleted.
  • If there are a lots of users being created and deleted in the cluster (canary testing), there could be a lot of deleted users synced by the store-gateway.
  • This change moves the tenant-deletion-mark.json to a global markers directory.
    • s3://<bucket>/__markers__/<tenantID>/tenant-deletion-mark.json
  • The string __markers__ is treated as special an no tenants are allowed to have that name.
  • With this change, the tenants can immediately be cleaned up by the cleaner after deletion and no longer has to stay active until tenant_cleanup_delay.

Which issue(s) this PR fixes:
Fixes #5674 and #5675

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@harry671003
Copy link
Contributor Author

@yeya24 @alanprot Could you take a look?

@yeya24
Copy link
Contributor

yeya24 commented Nov 27, 2023

@harry671003 I wonder why having the marker in the global dir can help us clean up tenants faster. Can you help elaborate more on it?

IIUC, making it a global dir is easier to iterate through the specific folder and finding all tenants deletion marker. But Idk why it helps clean up faster.

@harry671003
Copy link
Contributor Author

harry671003 commented Nov 27, 2023

@harry671003 I wonder why having the marker in the global dir can help us clean up tenants faster. Can you help elaborate more on it?

IIUC, making it a global dir is easier to iterate through the specific folder and finding all tenants deletion marker. But Idk why it helps clean up faster.

Why having the marker in the global dir can help us clean up tenants faster

I meant to say, that we no longer have to keep the tenant active with only the markers folder.


Let's say we have 2 tenants in a s3 bucket:

cortex-bucket:
  - tenant1:
     - 01HG9BCQ3GC2QZN9HSKTTRRDC1
     - 01HG9BCQ3GW3HKSC3KA0CW8K9Q
     - 01HG9BCQ3GAP3TGGWTQ34EDJFT
     - markers:
        - 01HG9BCQ3HZKJQRF68YBJ154MM-deletion-mark.json

  - tenant2:
    - 01HG9BCQ3G8SA2QE8QGBY93GYT
    - 01HG9BCQ3H052Y35ZMBP6RYDMH
    - markers:
      - 01HG9BVX7VWNRZ0EZ7EH18FJQM-deletion-mark.json

After we delete tenant2, there will still be the tenant-deletion-mark.json in the tenant2/ prefix.

cortex-bucket:
  - tenant1:
     - 01HG9BCQ3GC2QZN9HSKTTRRDC1
     - 01HG9BCQ3GW3HKSC3KA0CW8K9Q
     - 01HG9BCQ3GAP3TGGWTQ34EDJFT
     - markers:
        - 01HG9BCQ3HZKJQRF68YBJ154MM-deletion-mark.json

  - tenant2:
    - markers:
      - tenant-deletion-mark.json

If we have a lot of deleted users, they will still be kept around for upto the tenant_cleanup_delay.
This is what this PR is trying to avoid. If we move the markers to a global dir, then the tenant2 will no longer be active.

cortex-bucket:
  - tenant1:
     - 01HG9BCQ3GC2QZN9HSKTTRRDC1
     - 01HG9BCQ3GW3HKSC3KA0CW8K9Q
     - 01HG9BCQ3GAP3TGGWTQ34EDJFT
     - markers:
        - 01HG9BCQ3HZKJQRF68YBJ154MM-deletion-mark.json

  - __markers__:
    - tenant2:
      - tenant-deletion-mark.json

StoreGateways will no longer have to discover the deleted tenants with only the markers and filter them out doing the compute intensive GetShuffleShardSubRing().

@yeya24
Copy link
Contributor

yeya24 commented Nov 28, 2023

Since we move the tenant deletion marker to another path, I think we need to add another configuration to caching bucket to cache the global tenant deletion marker path accordingly. https://github.com/cortexproject/cortex/blob/master/pkg/storage/tsdb/caching_bucket.go#L131

Signed-off-by: 🌲 Harry 🌊 John 🏔 <[email protected]>
Signed-off-by: 🌲 Harry 🌊 John 🏔 <[email protected]>
Signed-off-by: 🌲 Harry 🌊 John 🏔 <[email protected]>
@harry671003 harry671003 force-pushed the global_tenant_deletion_marker branch from c96836a to c3503fd Compare November 29, 2023 18:11
CHANGELOG.md Outdated Show resolved Hide resolved
Signed-off-by: 🌲 Harry 🌊 John 🏔 <[email protected]>
@harry671003
Copy link
Contributor Author

Since we move the tenant deletion marker to another path, I think we need to add another configuration to caching bucket to cache the global tenant deletion marker path accordingly. https://github.com/cortexproject/cortex/blob/master/pkg/storage/tsdb/caching_bucket.go#L131

Since I'm not changing the suffix, the new path should also be cached here: https://github.com/harry671003/cortex/blob/master/pkg/storage/tsdb/caching_bucket.go#L186

@alanprot
Copy link
Member

LGTM

@yeya24 yeya24 merged commit c85778c into cortexproject:master Nov 29, 2023
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants