Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce FORCE_DEFRAG compilation option to allow activedefrag run when allocator is not jemalloc #1303

Open
wants to merge 25 commits into
base: unstable
Choose a base branch
from

Conversation

ranshid
Copy link
Member

@ranshid ranshid commented Nov 14, 2024

Introduce compile time option to force activedefrag to run even when
jemalloc is not used as the allocator.
This is in order to be able to run tests with defrag enabled
while using memory instrumentation tools.

fixes: #1241

Introduce compile time option to force activedefrag to run even when
jemalloc is not used as the allocator.
This is in order to be able to run tests with defrag enabled
while using memory instrumantation tools.

Signed-off-by: ranshid <[email protected]>
Introduce compile time option to force activedefrag to run even when
jemalloc is not used as the allocator.
This is in order to be able to run tests with defrag enabled
while using memory instrumantation tools.

Signed-off-by: ranshid <[email protected]>
Signed-off-by: ranshid <[email protected]>
@ranshid ranshid requested a review from madolson November 14, 2024 11:45
Copy link

codecov bot commented Nov 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.87%. Comparing base (1acf7f7) to head (5ddb45e).

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1303      +/-   ##
============================================
+ Coverage     70.71%   70.87%   +0.15%     
============================================
  Files           119      119              
  Lines         64652    64616      -36     
============================================
+ Hits          45717    45794      +77     
+ Misses        18935    18822     -113     
Files with missing lines Coverage Δ
src/allocator_defrag.c 100.00% <100.00%> (ø)
src/config.c 77.59% <ø> (-0.06%) ⬇️
src/defrag.c 88.80% <ø> (-0.23%) ⬇️
src/server.h 100.00% <ø> (ø)

... and 45 files with indirect coverage changes

Signed-off-by: Ran Shidlansik <[email protected]>
@madolson
Copy link
Member

I think we'll want to merge this PR first btw, #1242, since it already exists. I think there will be some merge conflicts.

CMakeLists.txt Outdated Show resolved Hide resolved
Copy link
Member

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly makes sense to me. Some ideas for clarity but nothing to really change the direction.

src/CMakeLists.txt Outdated Show resolved Hide resolved
src/defrag.c Outdated
@@ -755,6 +755,15 @@ void defragScanCallback(void *privdata, const dictEntry *de) {
* or not, a false detection can cause the defragmenter to waste a lot of CPU
* without the possibility of getting any results. */
float getAllocatorFragmentation(size_t *out_frag_bytes) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of feels like you should override computeDefragCycles instead, and have that should always set active_defrag_running to like 100%.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with it (makes sense) I do not, however want to break tests which are checking the active_defrag_running, so maybe I can just use the active-defrag-cycle-max as the return value?
Alternatively we can have a tag to skip tests in case of DEBUG_FORCE_DEFRAG

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively we can have a tag to skip tests in case of DEBUG_FORCE_DEFRAG

This intuitively makes a bit more sense to me, since defrag shouldn't really work correctly since we are completely breaking defrag here.

Copy link
Member

@madolson madolson Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the tests already are checking for jemalloc memory to enable the defragmentation, so I think that the tests should all still be skipped anyways?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my last change keeps the same logic and the code is cleaner. I find having the defrag reporting the fragmentation based on the config limits is fine but take a look and let me know.

src/defrag.c Outdated Show resolved Hide resolved
@madolson
Copy link
Member

@ranshid Zvis change is now merged. Jim's is basically also ready to go so you should be able to resume updating this if you want.

src/defrag.c Outdated Show resolved Hide resolved
@ranshid
Copy link
Member Author

ranshid commented Dec 8, 2024

@ranshid Zvis change is now merged. Jim's is basically also ready to go so you should be able to resume updating this if you want.

Thank you. just got time to circle back on this one. merging and continue where I left off

Signed-off-by: Ran Shidlansik <[email protected]>
Signed-off-by: Ran Shidlansik <[email protected]>
Signed-off-by: Ran Shidlansik <[email protected]>
Signed-off-by: Ran Shidlansik <[email protected]>
Signed-off-by: Ran Shidlansik <[email protected]>
Signed-off-by: Ran Shidlansik <[email protected]>
@ranshid
Copy link
Member Author

ranshid commented Dec 9, 2024

we will probably hit issues from day 1. eg:

Logged sanitizer errors (pid 53103):
=================================================================
==53103==ERROR: AddressSanitizer: heap-use-after-free on address 0x6080000026b8 at pc 0x0001006130d0 bp 0x00016fa72ec0 sp 0x00016fa72eb8
READ of size 4 at 0x6080000026b8 thread T0
    #0 0x1006130cc in defragStageKvstoreHelper+0x650 (valkey-server:arm64+0x1002870cc)
    #1 0x100610d64 in activeDefragTimeProc+0x2e0 (valkey-server:arm64+0x100284d64)
    #2 0x1003c128c in aeProcessEvents+0x630 (valkey-server:arm64+0x10003528c)
    #3 0x10040334c in main+0x6d7c (valkey-server:arm64+0x10007734c)
    #4 0x19d1ff150 in start+0x9a8 (dyld:arm64e+0xfffffffffff4d150)

0x6080000026b8 is located 24 bytes inside of 96-byte region [0x6080000026a0,0x608000002700)
freed by thread T3 here:
    #0 0x101338d40 in free+0x98 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x54d40)
    #1 0x1005bed10 in lazyfreeFreeDatabase+0x54 (valkey-server:arm64+0x100232d10)
    #2 0x100578750 in bioProcessBackgroundJobs+0x4e8 (valkey-server:arm64+0x1001ec750)
    #3 0x101335858 in asan_thread_start(void*)+0x40 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x51858)
    #4 0x19d589f90 in _pthread_start+0x84 (libsystem_pthread.dylib:arm64e+0x6f90)
    #5 0x19d584d30 in thread_start+0x4 (libsystem_pthread.dylib:arm64e+0x1d30)

previously allocated by thread T0 here:
    #0 0x101338fd0 in calloc+0x9c (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x54fd0)
    #1 0x1004120a0 in valkey_calloc+0x3c (valkey-server:arm64+0x1000860a0)
    #2 0x1003cd844 in kvstoreCreate+0xa8 (valkey-server:arm64+0x100041844)
    #3 0x1005bf778 in emptyDbAsync+0xbc (valkey-server:arm64+0x100233778)
    #4 0x100458ce8 in emptyData+0x354 (valkey-server:arm64+0x1000ccce8)
    #5 0x100459cd4 in flushallCommand+0x1c0 (valkey-server:arm64+0x1000cdcd4)
    #6 0x1003e9500 in call+0x3cc (valkey-server:arm64+0x10005d500)
    #7 0x1003ec7a4 in processCommand+0x17b4 (valkey-server:arm64+0x1000607a4)
    #8 0x100424358 in processInputBuffer+0x498 (valkey-server:arm64+0x100098358)
    #9 0x100422204 in readQueryFromClient+0xd4 (valkey-server:arm64+0x100096204)
    #10 0x100668688 in connSocketEventHandler+0x160 (valkey-server:arm64+0x1002dc688)
    #11 0x1003c0fb8 in aeProcessEvents+0x35c (valkey-server:arm64+0x100034fb8)
    #12 0x10040334c in main+0x6d7c (valkey-server:arm64+0x10007734c)
    #13 0x19d1ff150 in start+0x9a8 (dyld:arm64e+0xfffffffffff4d150)

Thread T3 created by T0 here:
    #0 0x1013301c8 in pthread_create+0x5c (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x4c1c8)
    #1 0x100578104 in bioInit+0x28c (valkey-server:arm64+0x1001ec104)
    #2 0x1003ff6b4 in main+0x30e4 (valkey-server:arm64+0x1000736b4)
    #3 0x19d1ff150 in start+0x9a8 (dyld:arm64e+0xfffffffffff4d150)

I think it is after the changes introduced in #1242 since the db expires can be freed in the background but is passed as target in beginDefragCycle.

I wonder if to invest time to fix it now or address them later

Signed-off-by: Ran Shidlansik <[email protected]>
src/zmalloc.c Outdated Show resolved Hide resolved
Signed-off-by: Madelyn Olson <[email protected]>
Copy link
Member

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direction looks fine to me, but just a minor comment about removing one of the test permutations.

src/allocator_defrag.h Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[NEW]Allow activedefrag to run when jemalloc is not used
3 participants