Skip to content

Commit

Permalink
MB-54221: Make LinkedList field read for stats atomic
Browse files Browse the repository at this point in the history
A TSAN failure has been seen:

  WARNING: ThreadSanitizer: data race (pid=28207)
    Write of size 8 at 0x7b4400001a00 by thread T21:
      #0 cb::NonNegativeCounter<unsigned long, cb::DefaultUnderflowPolicy>::fetch_sub(long) ../platform/include/platform/non_negative_counter.h:142 (ep_testsuite+0x5bd248)
      #1 cb::NonNegativeCounter<unsigned long, cb::DefaultUnderflowPolicy>::operator--() ../platform/include/platform/non_negative_counter.h:175 (ep_testsuite+0x9a535d)
      #2 BasicLinkedList::purgeListElem(boost::intrusive::list_iterator<boost::intrusive::mhtraits<OrderedStoredValue, boost::intrusive::list_member_hook<>, &OrderedStoredValue::seqno_hook>, false>, bool, bool) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/kv_engine/engines/ep/src/linked_list.cc:412 (ep_testsuite+0xa948c1)

    Previous read of size 8 at 0x7b4400001a00 by main thread (mutexes: write M3032, write M1005282691001636776):
      #0 cb::NonNegativeCounter<unsigned long, cb::DefaultUnderflowPolicy>::load() const ../platform/include/platform/non_negative_counter.h:89 (ep_testsuite+0x5bcd35)
      #1 cb::NonNegativeCounter<unsigned long, cb::DefaultUnderflowPolicy>::operator unsigned long() const ../platform/include/platform/non_negative_counter.h:85 (ep_testsuite+0x687165)
      #2 BasicLinkedList::getNumStaleItems() const /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/kv_engine/engines/ep/src/linked_list.cc:307 (ep_testsuite+0xa94a7c)

This is seen because we read the non-atomic numStaleItems from
EphemeralVBucket::CountVisitor without a lock, while the value could be
changed from another thread.

We don't want to have to lock the list just to read the stats. Make the
fields read for stats atomic to resolve the race condition.

They are only used for stats; it does not need to be read with any
particular consistency with other values. If it had, acquiring the
listWriteLock would be required.

Change-Id: Ie557b1363ffd987ef108c19e2bfc200481c6e5f1
Co-Authored-By: Vesko Karaganev <[email protected]>
Reviewed-on: https://review.couchbase.org/c/kv_engine/+/189597
Tested-by: Vesko Karaganev <[email protected]>
Reviewed-by: Dave Rigby <[email protected]>
  • Loading branch information
jameseh96 and veselink1 committed Sep 18, 2023
1 parent b14acd3 commit ef05dad
Showing 1 changed file with 10 additions and 4 deletions.
14 changes: 10 additions & 4 deletions engines/ep/src/linked_list.h
Original file line number Diff line number Diff line change
Expand Up @@ -244,8 +244,10 @@ class BasicLinkedList : public SequenceList {
*
* This should be non-decrementing, apart from a rollback where it will be
* reset.
*
* Atomic as read for stats without taking the listWriteLock.
*/
Monotonic<seqno_t> highestPurgedDeletedSeqno;
AtomicMonotonic<seqno_t> highestPurgedDeletedSeqno;

/**
* Seqno of the last visible item. Accounts only committed sync-writes (ie,
Expand All @@ -265,16 +267,20 @@ class BasicLinkedList : public SequenceList {
* Indicates the number of elements in the list that are stale (old,
* duplicate values). Stale items are owned by the list and hence must
* periodically clean them up.
*
* Atomic as read for stats without taking the listWriteLock.
*/
cb::NonNegativeCounter<uint64_t> numStaleItems;
cb::AtomicNonNegativeCounter<uint64_t> numStaleItems;

/**
* Indicates the number of logically deleted items in the list.
* Since we are append-only, distributed cache supporting incremental
* replication, we need to keep deleted items for while and periodically
* purge them
* purge them.
*
* Atomic as read for stats without taking the listWriteLock.
*/
cb::NonNegativeCounter<uint64_t> numDeletedItems;
cb::AtomicNonNegativeCounter<uint64_t> numDeletedItems;

/* Used only to log debug messages */
const Vbid vbid;
Expand Down

0 comments on commit ef05dad

Please sign in to comment.