Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: expired kvs are cleaned up early due to inconsistent time in meta #14047

Closed
zhyass opened this issue Dec 16, 2023 · 2 comments
Closed

bug: expired kvs are cleaned up early due to inconsistent time in meta #14047

zhyass opened this issue Dec 16, 2023 · 2 comments

Comments

@zhyass
Copy link
Member

zhyass commented Dec 16, 2023

Summary

> optimize table test3_01 compact;

OPTIMIZE TABLE test3_01 compact

error happens after fetched 0 rows: APIError: PageError with 4000: transaction aborted after 0 retries, which took 10 ms
7080000 rows written in 4.050 sec. Processed 7.08 million rows, 584.89 MiB (1.75 million rows/s, 144.43 MiB/s)

error log:

<Backtrace disabled by default. Please use RUST_BACKTRACE=1 to enable>
2023-12-16T02:26:50.904932Z ERROR common_storages_fuse::operations::common::processors::sink_commit: 
src/query/storages/fuse/src/operations/common/processors/[sink_commit.rs:310](http://sink_commit.rs:310/) commit mutation
failed cause get lock failed, error: TableLockExpired. Code: 2016, Text = the acquired table lock in '3181978' has been expired.

A look at the meta log shows that the table lock expired when it was created. But the default expiration time should be 10s.

Meta log:

{"timestamp":"2023-12-16T02:26:40.237885Z","level":"INFO","fields":{"message":"apply_result: cmd: upsert_kv:__fd_id_gen/table_lock_id(GE(0)) = Update(\"[binary]\") (None); res: AppliedState: KV: id: Noneprev: Some(SeqV { seq: 3173636, meta: None, data: \"[binary]\" })result: Some(SeqV { seq: 3182039, meta: None, data: \"[binary]\" })"}}
{"timestamp":"2023-12-16T02:26:40.237938Z","level":"INFO","fields":{"message":"raft.client_write res ok: log_id: 305-76-3299203, data: AppliedState: KV: id: Noneprev: Some(SeqV { seq: 3173636, meta: None, data: \"[binary]\" })result: Some(SeqV { seq: 3182039, meta: None, data: \"[binary]\" }), membership: None"}}
{"timestamp":"2023-12-16T02:26:40.240632Z","level":"INFO","fields":{"message":"write LogEntry: time: 1702693600240 ms cmd: txn:TxnRequest{ if:[__fd_table_by_id/3181978 == seq(3181986),__fd_table_lock/3181978/3182039 == seq(0)] then:[Put(Put key=__fd_table_lock/3181978/3182039, need prev_value: true expire at: 1702693600)] else:[] }"}}
{"timestamp":"2023-12-16T02:26:40.242116Z","level":"INFO","fields":{"message":"apply: normal: time: 1702693600240 ms cmd: txn:TxnRequest{ if:[__fd_table_by_id/3181978 == seq(3181986),__fd_table_lock/3181978/3182039 == seq(0)] then:[Put(Put key=__fd_table_lock/3181978/3182039, need prev_value: true expire at: 1702693600)] else:[] }"}}
{"timestamp":"2023-12-16T02:26:40.242129Z","level":"INFO","fields":{"message":"apply_cmd: txn:TxnRequest{ if:[__fd_table_by_id/3181978 == seq(3181986),__fd_table_lock/3181978/3182039 == seq(0)] then:[Put(Put key=__fd_table_lock/3181978/3182039, need prev_value: true expire at: 1702693600)] else:[] }"}}
{"timestamp":"2023-12-16T02:26:40.242166Z","level":"INFO","fields":{"message":"apply_result: cmd: txn:TxnRequest{ if:[__fd_table_by_id/3181978 == seq(3181986),__fd_table_lock/3181978/3182039 == seq(0)] then:[Put(Put key=__fd_table_lock/3181978/3182039, need prev_value: true expire at: 1702693600)] else:[] }; res: AppliedState: Txn: TxnReply{ success: true, responses: [TxnOpResponse: Put: Put-resp: key=__fd_table_lock/3181978/3182039, prev_seq=None], error: }"}}
{"timestamp":"2023-12-16T02:26:40.242210Z","level":"INFO","fields":{"message":"raft.client_write res ok: log_id: 305-76-3299204, data: AppliedState: Txn: TxnReply{ success: true, responses: [TxnOpResponse: Put: Put-resp: key=__fd_table_lock/3181978/3182039, prev_seq=None], error: }, membership: None"}}

By comparing the system time of the meta cluster, it was found to be inconsistent.

img_v3_0266_8a1033fb-9d07-4c9b-a26c-4f52c586373g

@zhyass
Copy link
Member Author

zhyass commented Dec 18, 2023

CC @wubx @drmingdrmer

@drmingdrmer
Copy link
Member

In this PR a relative expiration time ttl is added and is ready to use:

But using it at the databend-query side force an upgrade of the meta-service cluster, which may not be expected by customers.
This relative expiration time will be enabled when our customers finished upgrading.

@zhyass zhyass closed this as completed Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants