-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CF mark
for Lock
and Rollback
records
#102
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Yilin Chen <[email protected]>
Here you go! |
|
||
In `CheckSecondaryKeys`, we need to check both CFs of all the given secondary keys to know whether some keys are already committed or rolled back. | ||
|
||
And when prewrite raises an error or we are prewriting non-pessimistic keys in a retry, we also need the precise status of the key to guarantee idempotence. This also requires to read both the write and mark CFs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The prewrite request with deferred constraint check may also need the precise status of the key, maybe it could be considered as a happy path for committing this kind of transaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get it... Could you detail the reason a bit more?
It seems to me that prewrite with deferred constraint check needs (1) write conflict check and (2) the latest effective record (PUT
/DELETE
). Write conflict check only needs the latest record in the write CF. And we don't cost more for reading the latest effective record.
If there is a mark record for this key, prewrite must fail because of a newer record in the write CF.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So actually the deferred prewrite request processing does not need to read the mark CF every time? Seems I misunderstood I thought it needed to check mark CF each time and there would be more overhead than before for this kind of transaction.
|
||
If a new `Lock` record is written to the write CF while the latest version is also a `Lock` or `Rollback`, instead of removing the previous version, just overwrite that record and add a `real_commit_ts` to the record. When checking write conflicts, we should parse the value and check the real commit TS because the timestamp encoded into the key may be not accurate. | ||
|
||
It may help reduce tombstones but breaks too many assumptions before. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are no typical user scenarios in which this kind of tombstone issue has a significant impact, the priority for this optimization could be lowered as it's a bit complex to prove the correctness.
|
||
And when prewrite raises an error or we are prewriting non-pessimistic keys in a retry, we also need the precise status of the key to guarantee idempotence. This also requires to read both the write and mark CFs. | ||
|
||
Luckily, all of these don't happen frequently in production. The extra cost is not a big issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the key format is {user_key}{start_ts}
and it's different from keys is write cf, maybe we could describe a bit more details about the conflict check process here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conflict checking itself only needs to read the write CF for the maximum version. When we need to read the mark CF, we are confirming whether the transaction we are prewriting has been rolled back. So, it's always a point get according to the start_ts
on the mark CF.
|
||
- Support generating and ingesting snapshot for the mark CF | ||
- Consider the keys and size of the mark CF in the split checker | ||
- After removing the WAL of KV DB, the memtable needs to be flushed if it blocks the GC of the Raft logs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the separation for kv db instances, there may be some compatible work.
When BR takes a snapshot of TiKV, all the locks before the snapshot should be resolved. In this case, the records in the mark CF really don't matter. | ||
|
||
BR can just ignore the mark CF. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another benefit that could be mentioned here is pessimistic locking behavior issues like pingcap/tidb#36438 could be resolved completely and previous tricky solutions could be optimized.
|
||
#### Format | ||
|
||
Key: `{user_key}{start_ts}` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We always read the mark CF by key with a specific timestamp, am I right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
|
||
The records in the mark CF don't need to exist after all keys in the transaction are totally resolved. The client resolves all the locks before a certain timestamp before updating this timestamp as the safe point. | ||
|
||
So, when TiKV is ready to do GC, all records in the mark CF whose `commit_ts` is less than the safe point can be deleted. It can be done in the compaction filter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about Rollback
records, they do not have commit_ts
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conventionally, the commit_ts
of Rollback
is exactly its start_ts
.
|
||
We already have a collapsing mechanism to merge consecutive `Rollback` records. And it was invented back to the days when pessimistic transactions are not supported. Now, it's unlikely to have many `Rollback` records that affect read performance. | ||
|
||
The `Lock` record is a bit more complicated. At the very beginning, when there is no pessimistic transaction or async-commit transaction, it is only used to check read-write conflicts, mostly for write-skew prevention. In pessimistic transactions, if a key is locked but not changed in the end, the pessimistic lock will be finally turned into a `Lock` record. In these cases, `Lock` records exist to cause write conflicts. If it happens to be the primary key of the transaction, it also marks the committed status. So, if the `Lock` record is only to cause write conflicts, it doesn't need to exist after any newer record is written. However, it is not true for the primary keys. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Lock
record for optimistic transaction is also used to cause write conflicts, right ?
And, how to understand write-skew prevention
?
I executed the optimistic transaction like "begin optimistic;select * from t1 where name = "xxxxx" for update ;commit;". It also generated a Lock
record for write
CF. So, the LOCK
record also as primary key of optimistic transaction ans marks the committed status, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
Lock
record for optimistic transaction is also used to cause write conflicts, right ?
And, how to understandwrite-skew prevention
?
Yes. write-skew prevention
is right the reason why people use SELECT FOR UPDATE
in optimistic transactions. For example, select where id = 1 and use the result to update the row id = 2. There can be a write-skew if we don't check conflict on id = 1.
I executed the optimistic transaction like "begin optimistic;select * from t1 where name = "xxxxx" for update ;commit;". It also generated a
Lock
record forwrite
CF. So, theLOCK
record also as primary key of optimistic transaction ans marks the committed status, right?
Right. And that's why I say "However, it is not true for the primary keys.". In this case, the Lock
cannot be collapsed.
It brings a little extra jobs when committing the transaction such as writing two column families, deleting old |
To completely solve the issue, it is a must to overwrite the latest However, reusing the key of the latest |
It's quite risky to overwrite the existing key, maybe other workarounds are needed to avoid the worst case with too much tombstone next in the memory table. |
No description provided.