RFC: use atomic reference count to track external in-memory cache record usage #779

MrCroxx · 2024-10-22T09:58:58Z

Atomic Reference Count Management

[RawCache] uses an atomic reference count to management the release of an entry.

The atomic reference count represents the external references of a cache record.

When the reference count drops to zero, the related cache shard is locked to release to cache record.

It is important to guarantee the correctness of the usage of the atomic reference count. Especially when triggering
the release of a record. Without any other synchronize mechanism, there would be dangling pointers or double frees:

Thread 1: [ decrease ARC to 0 ] ============> [ release record ]
Thread 2:                         [ increase ARC to 1 ] =======> dangling!! ==> double free!!

Thankfully, we can prevent it from happening with the usage of the shard lock:

The only ops that will increase the atomic reference count are:

Insert/get/fetch the [RawCache] and get external entries. (locked)
Clone an external entry. (lock-free)

The op 1 is always guarded by a mutex/rwlock, which means it is impossible to happen while releasing a record with
the shard is locked.

The op 2 is lock-free, but only happens when there is at least 1 external reference. So, it cannot happen while
releasing a record. Because when releasing is happening, there must be no external reference.

So, this case will never happen:

Thread 1: [ decrease ARC to 0 ] ================> [ release record ]
Thread 2:                           [ (op2) increase ARC to 1 ] ==> dangling!! ==> double free!!

When starting to release a record, after locking the shard, it's still required to check the atomic reference count.
Because this cause still exist:

Thread 1: [ decrease ARC to 0 ] ====================================> [ release record ]
Thread 2:                           [ (op1) increase ARC to 1 ] =======> dangling!! ==> double free!!

Although the op1 requires to be locked, but the release operation can be delayed after that. So the release
operation can be ignored after checking the atomic reference count is not zero.

There is no need to be afraid of leaking, there is always to be a release operation following the increasing when
the atomic reference count drops to zero again.

Related issues & PRs

#778

The text was updated successfully, but these errors were encountered:

hzxa21 · 2024-10-23T04:01:20Z

The RFC LGTM.

Let me put the background of this RFC here (@MrCroxx correct me if I am wrong):

Prior to this RFC, the reference counter of a cache entry is an int under the protection of the shard lock. That means whenever we need to do ref++ or ref--, the shard lock needs to be acquired.
After this RFC, the reference counter of a cache entry is an atomic. The reference counting manipulation and shard lock is decoupled. That means we don't have to acquire the shard lock when ref > 0. We only need to acquire the shard lock when ref drops to 0.

wenym1 · 2024-10-23T04:43:48Z

A drawback of using atomic add and atomic sub is that we are not aware of the value of the variable when the operation is actually applied on it. To resolve this drawback, a feasible solution can be using CAS. With CAS, we can be aware of the value before the operation is applied on the atomic variable.

A general implementation can be:

Acquirers ensure that that will not increment the value when the value is 0
Releasers decrement the value anyway. The releaser that decrement the value from 1 to 0 is responsible for releasing the object.

Li0k · 2024-10-23T05:27:10Z

After discussion, we found that there are several situations that may lead to entry clone:

single flight , after the acquisition, each waiter will get a clone of the entry. 2. cache get , get the clone of the entry from the foyer cache.
cache get, clone of entry from foyer cache
clone when entry is used by upper level caller.

Therefore, clone is not always initiated by the caller (3), but can also come from 1 and 2, so I'm in favour of adopting the current RFCs to address all the scenarios involved.

Since the frequency of locks has been greatly reduced since the RFC, using CAS / Lock is acceptable to me for both.

Thanks for the efforts

MrCroxx · 2024-10-23T05:35:41Z

A drawback of using atomic add and atomic sub is that we are not aware of the value of the variable when the operation is actually applied on it. To resolve this drawback, a feasible solution can be using CAS. With CAS, we can be aware of the value before the operation is applied on the atomic variable.

A general implementation can be:

Acquirers ensure that that will not increment the value when the value is 0

Releasers decrement the value anyway. The releaser that decrement the value from 1 to 0 is responsible for releasing the object.

I think this optimization does not solve the problem. If I understand incorrectly, please correct me.

When the releaser finds itself responsible for releasing the object, it still needs to acquire the mutex lock of the shard. During the gap between the lock guard is actually acquired, there is a chance that another thread calls "get/fetch" on the same key and increases the refs again.

MrCroxx · 2024-10-25T01:28:25Z

ABA and Double Free found.

e.g.

Thread 1: [ decrease ARC to 0 ] ==================================================================================> [ release record ] ( dangling!!  double free!! )
Thread 2:                         [ (op2) increase ARC to 1 ] ===> [ decrease ARC to 0 ] ===> [ release record ]

MrCroxx · 2024-10-25T01:47:45Z

ABA and Double Free found.

e.g.

Thread 1: [ decrease ARC to 0 ] ==================================================================================> [ release record ] ( dangling!!  double free!! )
Thread 2:                         [ (op2) increase ARC to 1 ] ===> [ decrease ARC to 0 ] ===> [ release record ]

Let me try to solve the case with versioned CAS.

MrCroxx · 2024-10-25T02:07:41Z

@arkbriar proposed an exquisite solution. Let me organize and record it.

wenym1 · 2024-10-25T04:36:29Z

ABA and Double Free found.

e.g.

Thread 1: [ decrease ARC to 0 ] ==================================================================================> [ release record ] ( dangling!!  double free!! )
Thread 2:                         [ (op2) increase ARC to 1 ] ===> [ decrease ARC to 0 ] ===> [ release record ]

If we use CAS to increment the ref count and following the previously mentioned protocol, this won't be the problem.

* Acquirers ensure that that will not increment the value when the value is 0
* Releasers decrement the value anyway. The releaser that decrement the value from 1 to 0 is responsible for releasing the object.

In the example above, when Thread 2 try to increase ref count from 0 to 1, it should not succeed, and should do other work.

wenym1 · 2024-10-25T04:43:41Z

After rethinking about this proposal, I doubt whether this proposal is necessary.

Like summarized in the issue description above, we have two ways to increment the ref count, Insert/get/fetch and entry clone. IIUC, this proposal can only improve the performance of entry clone to reduce the time of acquiring the lock, and for Insert/get/fetch, we should acquire the lock anyway. However, if this proposal is only for improve entry clone, an easier implementation is simply wrapping the returned entry with an Arc, and then the problem of acquire lock during entry clone can be easily resolved.

Li0k · 2024-10-25T05:27:03Z

After rethinking about this proposal, I doubt whether this proposal is necessary.

Like summarized in the issue description above, we have two ways to increment the ref count, Insert/get/fetch and entry clone. IIUC, this proposal can only improve the performance of entry clone to reduce the time of acquiring the lock, and for Insert/get/fetch, we should acquire the lock anyway. However, if this proposal is only for improve entry clone, an easier implementation is simply wrapping the returned entry with an Arc, and then the problem of acquire lock during entry clone can be easily resolved.

+1, Do we have a scenario that consistently reproduces this issue?
And is the cause of the performance degradation from a single key hotspot. I'd like to analyse it in a specific scenario.

MrCroxx · 2024-10-25T05:44:44Z

After rethinking about this proposal, I doubt whether this proposal is necessary.

Like summarized in the issue description above, we have two ways to increment the ref count, Insert/get/fetch and entry clone. IIUC, this proposal can only improve the performance of entry clone to reduce the time of acquiring the lock, and for Insert/get/fetch, we should acquire the lock anyway. However, if this proposal is only for improve entry clone, an easier implementation is simply wrapping the returned entry with an Arc, and then the problem of acquire lock during entry clone can be easily resolved.

This RFC is only one part of the optinizations in #778 . #778 also contains optimization that allows get operation only require the read lock if the algorithm don't need to modify the entry. This RFC is a requirement of that.

wenym1 · 2024-10-25T05:56:46Z

After rethinking about this proposal, I doubt whether this proposal is necessary.
Like summarized in the issue description above, we have two ways to increment the ref count, Insert/get/fetch and entry clone. IIUC, this proposal can only improve the performance of entry clone to reduce the time of acquiring the lock, and for Insert/get/fetch, we should acquire the lock anyway. However, if this proposal is only for improve entry clone, an easier implementation is simply wrapping the returned entry with an Arc, and then the problem of acquire lock during entry clone can be easily resolved.

This RFC is only one part of the optinizations in #778 . #778 also contains optimization that allows get operation only require the read lock if the algorithm don't need to modify the entry. This RFC is a requirement of that.

I see. Do we have any issue about the general optimization in #778? The PR is quite large, and is quite reviewer-unfriendly without sufficient context about the optimization.

MrCroxx · 2024-10-25T08:32:55Z

After rethinking about this proposal, I doubt whether this proposal is necessary.
Like summarized in the issue description above, we have two ways to increment the ref count, Insert/get/fetch and entry clone. IIUC, this proposal can only improve the performance of entry clone to reduce the time of acquiring the lock, and for Insert/get/fetch, we should acquire the lock anyway. However, if this proposal is only for improve entry clone, an easier implementation is simply wrapping the returned entry with an Arc, and then the problem of acquire lock during entry clone can be easily resolved.

This RFC is only one part of the optinizations in #778 . #778 also contains optimization that allows get operation only require the read lock if the algorithm don't need to modify the entry. This RFC is a requirement of that.

I see. Do we have any issue about the general optimization in #778? The PR is quite large, and is quite reviewer-unfriendly without sufficient context about the optimization.

Sorry about that. I built the new framework from scratch and it is really not easy to see diffs. I'm working on docs and will replace thr origin it with the new framework after trying the idea Ark suggested.

arkbriar · 2024-10-25T10:02:10Z

I'm working on docs and will replace thr origin it with the new framework after trying the idea Ark suggested.

xD the idea is basically the same as what @wenym1 proposed in #779 (comment) except its release point is in the negative (thread that performs the real free must has successfully CAS(ref_count, 0, -1) after it decreases the counter to 0) so that threads can have a chance to pick a pointer that has been released by all other threads (a.k.a., ref count = 0) up (a.k.a., ref count = 1) again. In normal cases, the two ideas are identical but when the reference count of a pointer is intensively competitive, mine will help reduce the reference failures.

MrCroxx added the RFC Request for Comments label Oct 22, 2024

MrCroxx added this to the v0.13 milestone Oct 22, 2024

MrCroxx self-assigned this Oct 22, 2024

MrCroxx added this to foyer - Development Roadmap Oct 22, 2024

MrCroxx mentioned this issue Oct 24, 2024

feat: optimize lock critical section #778

Closed

3 tasks

MrCroxx mentioned this issue Nov 19, 2024

feat: simplify foyer in-memory cache framework #785

Merged

3 tasks

MrCroxx closed this as completed in #785 Nov 19, 2024

github-project-automation bot moved this to Done in foyer - Development Roadmap Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: use atomic reference count to track external in-memory cache record usage #779

RFC: use atomic reference count to track external in-memory cache record usage #779

MrCroxx commented Oct 22, 2024

hzxa21 commented Oct 23, 2024

wenym1 commented Oct 23, 2024

Li0k commented Oct 23, 2024

MrCroxx commented Oct 23, 2024

MrCroxx commented Oct 25, 2024

MrCroxx commented Oct 25, 2024

MrCroxx commented Oct 25, 2024

wenym1 commented Oct 25, 2024

wenym1 commented Oct 25, 2024

Li0k commented Oct 25, 2024 •

edited

Loading

MrCroxx commented Oct 25, 2024 •

edited

Loading

wenym1 commented Oct 25, 2024

MrCroxx commented Oct 25, 2024

arkbriar commented Oct 25, 2024 •

edited

Loading

RFC: use atomic reference count to track external in-memory cache record usage #779

RFC: use atomic reference count to track external in-memory cache record usage #779

Comments

MrCroxx commented Oct 22, 2024

Atomic Reference Count Management

Related issues & PRs

hzxa21 commented Oct 23, 2024

wenym1 commented Oct 23, 2024

Li0k commented Oct 23, 2024

MrCroxx commented Oct 23, 2024

MrCroxx commented Oct 25, 2024

MrCroxx commented Oct 25, 2024

MrCroxx commented Oct 25, 2024

wenym1 commented Oct 25, 2024

wenym1 commented Oct 25, 2024

Li0k commented Oct 25, 2024 • edited Loading

MrCroxx commented Oct 25, 2024 • edited Loading

wenym1 commented Oct 25, 2024

MrCroxx commented Oct 25, 2024

arkbriar commented Oct 25, 2024 • edited Loading

Li0k commented Oct 25, 2024 •

edited

Loading

MrCroxx commented Oct 25, 2024 •

edited

Loading

arkbriar commented Oct 25, 2024 •

edited

Loading