Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for compiling with mimalloc #363

Open
wants to merge 29 commits into
base: unstable
Choose a base branch
from

Conversation

WM0323
Copy link
Contributor

@WM0323 WM0323 commented Apr 24, 2024

Related Issue: #346
Use mimalloc as an option when building as specified by make MALLOC=mimalloc or USE_MIMALLOC=yes

@WM0323 WM0323 force-pushed the mimalloc-support branch from 6ef673d to c05ecf5 Compare April 24, 2024 16:22
Copy link

codecov bot commented Apr 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 68.44%. Comparing base (443d80f) to head (e391645).
Report is 352 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable     #363      +/-   ##
============================================
- Coverage     68.44%   68.44%   -0.01%     
============================================
  Files           109      109              
  Lines         61671    61672       +1     
============================================
- Hits          42212    42209       -3     
- Misses        19459    19463       +4     
Files with missing lines Coverage Δ
src/server.c 88.13% <100.00%> (+<0.01%) ⬆️
src/zmalloc.c 83.46% <ø> (ø)

... and 12 files with indirect coverage changes

@PingXie
Copy link
Member

PingXie commented Apr 24, 2024

Thanks for the patch @WM0323!

Can you consider the alternative that doesn't require vendoring mimalloc? In general, we would like to de-vendor the dependencies going forward. You can find the devendoring discussion at #15.

@WM0323
Copy link
Contributor Author

WM0323 commented Apr 24, 2024

Thanks for the patch @WM0323!

Can you consider the alternative that doesn't require vendoring mimalloc? In general, we would like to de-vendor the dependencies going forward. You can find the devendoring discussion at #15.

Thanks for your feedback and directing me to the devendoring discussion. I will try to adjust the build scripts to link against a system-wide installation, managing mimalloc externally

@WM0323
Copy link
Contributor Author

WM0323 commented Apr 24, 2024

Add support to optionally use system libraries instead of the vendored mimalloc.

@hpatro
Copy link
Collaborator

hpatro commented Apr 25, 2024

I think we should also perform certain benchmark(s) to determine the memory usage/fragmentation/throughput while using mimalloc compared to jemalloc. Will help the community understand when to use mimalloc over jemalloc for Valkey.

@PingXie
Copy link
Member

PingXie commented Apr 25, 2024

Add support to optionally use system libraries instead of the vendored mimalloc.

I see that this PR allows the user to choose between vendoring and not. Any reason to NOT make the lib the only option?

@WM0323
Copy link
Contributor Author

WM0323 commented Apr 26, 2024

Add support to optionally use system libraries instead of the vendored mimalloc.

I see that this PR allows the user to choose between vendoring and not. Any reason to NOT make the lib the only option?

I agree with making the lib as the only option for the devendoring:). I will make changes to keep only one system lib opion and also run some benchmark to compare with jemalloc's memory usage/fragmentation/throughput

Shivshankar-Reddy and others added 12 commits April 26, 2024 17:38
Renamed below procedures and variables (missed in valkey-io#287) as follows.

redis_cluster             ->     valkey_cluster
redis1                    ->     valkey1
redis2                    ->     valkey2
get_redis_dir             ->     get_valkey_dir
test_redis_cli_rdb_dump   ->     test_valkey_cli_rdb_dump
test_redis_cli_repl       ->     test_valkey_cli_repl
redis-cli                 ->     valkey-cli
redis_reset_state         ->     valkey_reset_state
redisHandle               ->     valkeyHandle
redis_safe_read           ->     valkey_safe_read
redis_safe_gets           ->     valkey_safe_gets
redis_write               ->     valkey_write
redis_read_reply          ->     valkey_read_reply
redis_readable            ->     valkey_readable
redis_readnl              ->     valkey_readnl
redis_writenl             ->     valkey_writenl
redis_read_map            ->     valkey_read_map
redis_read_line           ->     valkey_read_line
redis_read_null           ->     valkey_read_null
redis_read_bool           ->     valkey_read_bool
redis_read_double         ->     valkey_read_double
redis_read_verbatim_str   ->     valkey_read_verbatim_str
redis_call_callback       ->     valkey_call_callback

---------

Signed-off-by: Shivshankar-Reddy <[email protected]>

Signed-off-by: Sher Sun <[email protected]>
fix
Signed-off-by: Sher Sun <[email protected]>
Includes some more changes e.g. the README under tests and some script under utils.

Signed-off-by: hwware <[email protected]>
Signed-off-by: Sher Sun <[email protected]>
Add License Copyright description for Valkey contributors in COPYING

---------

Signed-off-by: PatrickJS <[email protected]>
Signed-off-by: Madelyn Olson <[email protected]>
Co-authored-by: Madelyn Olson <[email protected]>
Co-authored-by: Madelyn Olson <[email protected]>

Signed-off-by: Sher Sun <[email protected]>
it's a supplementary modification from valkey-io#352.

Signed-off-by: artikell <[email protected]>
Signed-off-by: Sher Sun <[email protected]>
)

This PR covers below cases.
1. test suite's prints(i.e., puts statement logs).
2. Links refering to redis issues.
3. test names contains redis.

Signed-off-by: Shivshankar-Reddy <[email protected]>
Signed-off-by: Sher Sun <[email protected]>
…#294)

This PR has mainly done three things:

1. Enable `accept4()` on DragonFlyBSD
2. Fix the failures of determining the presence of `accept4()` due to
   the missing <sys/param.h> on two OSs: NetBSD, OpenBSD
3. Drop the support of FreeBSD < 10.0 for `valkey`

### References
- [param.h in
DragonFlyBSD](https://github.com/DragonFlyBSD/DragonFlyBSD/blob/7485684fa5c3fadb6c7a1da0d8bb6ea5da4e0f2f/sys/sys/param.h#L129-L257)
- [param.h in
FreeBSD](https://github.com/freebsd/freebsd-src/blob/main/sys/sys/param.h#L46-L76)
- [param.h in
NetBSD](https://github.com/NetBSD/src/blob/b5f8d2f930b7ef226d4dc1b4f7017e998c0e5cde/sys/sys/param.h#L53-L70)
- [param.h in
OpenBSD](https://github.com/openbsd/src/blob/d9c286e032a7cee9baaecdd54eb0d43f658ae696/sys/sys/param.h#L40-L45)

---------

Signed-off-by: Andy Pan <[email protected]>

Signed-off-by: Sher Sun <[email protected]>
Signed-off-by: Sher Sun <[email protected]>
@WM0323 WM0323 force-pushed the mimalloc-support branch from 486d12f to c4b890a Compare April 26, 2024 17:45
@WM0323
Copy link
Contributor Author

WM0323 commented Apr 26, 2024

Performance Test: mimalloc vs. jemalloc

Install the library and header files of mimalloc:

git clone https://github.com/microsoft/mimalloc.git
mkdir -p out/release
cd out/release
cmake ../.. -DMI_INSTALL_TOPLEVEL=ON 
sudo make install 

Build the Valkey

make MALLOC=mimalloc
or
make USE_MIMALLOC=yes

Start cluster

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib/
./create-cluster start   
./create-cluster create

valkey benchmark used:
./valkey-benchmark -t set -n 10000000 -c 20 -p 30003

Benchmark Results:

Metric Jemalloc mimalloc Change
Requests per second 70,063.34 78,664.59 +12.28%
Average Latency (ms) 0.185 0.176 -4.86%
Used Memory (Human Readable) 3.50M 3.52M +0.57%
Memory Fragmentation Ratio 2.82 2.33 -17.38%
Memory Allocator Version jemalloc-5.3.0 mimalloc-1.8.5

Jemalloc logs:

paas@dsde05:~/sher/valkey/src$ ./valkey-benchmark -t set -n 10000000 -c 20 -p 30003
====== SET ======
10000000 requests completed in 142.73 seconds
20 parallel clients
3 bytes payload
keep alive: 1
host configuration "save": 3600 1 300 100 60 10000
host configuration "appendonly": yes
multi-thread: no

Latency by percentile distribution:
0.000% <= 0.079 milliseconds (cumulative count 2)
50.000% <= 0.183 milliseconds (cumulative count 5405076)
75.000% <= 0.207 milliseconds (cumulative count 7644872)
87.500% <= 0.231 milliseconds (cumulative count 8934659)
93.750% <= 0.239 milliseconds (cumulative count 9442491)
96.875% <= 0.247 milliseconds (cumulative count 9734950)
98.438% <= 0.271 milliseconds (cumulative count 9855272)
99.219% <= 0.383 milliseconds (cumulative count 9922465)
99.609% <= 0.479 milliseconds (cumulative count 9963908)
99.805% <= 0.535 milliseconds (cumulative count 9981729)
99.902% <= 0.599 milliseconds (cumulative count 9990600)
99.951% <= 0.687 milliseconds (cumulative count 9995294)
99.976% <= 0.775 milliseconds (cumulative count 9997677)
99.988% <= 0.903 milliseconds (cumulative count 9998792)
99.994% <= 7.671 milliseconds (cumulative count 9999393)
99.997% <= 7.911 milliseconds (cumulative count 9999704)
99.998% <= 8.159 milliseconds (cumulative count 9999849)
99.999% <= 9.895 milliseconds (cumulative count 9999927)
100.000% <= 10.007 milliseconds (cumulative count 9999964)
100.000% <= 10.071 milliseconds (cumulative count 9999982)
100.000% <= 10.135 milliseconds (cumulative count 9999991)
100.000% <= 10.191 milliseconds (cumulative count 9999996)
100.000% <= 10.231 milliseconds (cumulative count 9999998)
100.000% <= 10.255 milliseconds (cumulative count 9999999)
100.000% <= 10.279 milliseconds (cumulative count 10000000)
100.000% <= 10.279 milliseconds (cumulative count 10000000)

Cumulative distribution of latencies:
0.059% <= 0.103 milliseconds (cumulative count 5931)
76.449% <= 0.207 milliseconds (cumulative count 7644872)
98.828% <= 0.303 milliseconds (cumulative count 9882762)
99.324% <= 0.407 milliseconds (cumulative count 9932391)
99.729% <= 0.503 milliseconds (cumulative count 9972912)
99.912% <= 0.607 milliseconds (cumulative count 9991180)
99.958% <= 0.703 milliseconds (cumulative count 9995837)
99.981% <= 0.807 milliseconds (cumulative count 9998127)
99.988% <= 0.903 milliseconds (cumulative count 9998792)
99.990% <= 1.007 milliseconds (cumulative count 9998985)
99.990% <= 1.103 milliseconds (cumulative count 9999047)
99.991% <= 1.207 milliseconds (cumulative count 9999090)
99.991% <= 1.303 milliseconds (cumulative count 9999114)
99.991% <= 1.407 milliseconds (cumulative count 9999133)
99.992% <= 1.503 milliseconds (cumulative count 9999151)
99.992% <= 1.607 milliseconds (cumulative count 9999172)
99.992% <= 1.703 milliseconds (cumulative count 9999186)
99.992% <= 1.807 milliseconds (cumulative count 9999203)
99.992% <= 1.903 milliseconds (cumulative count 9999215)
99.992% <= 2.007 milliseconds (cumulative count 9999229)
99.992% <= 2.103 milliseconds (cumulative count 9999242)
99.993% <= 3.103 milliseconds (cumulative count 9999320)
99.998% <= 8.103 milliseconds (cumulative count 9999837)
99.999% <= 9.103 milliseconds (cumulative count 9999880)
100.000% <= 10.103 milliseconds (cumulative count 9999989)
100.000% <= 11.103 milliseconds (cumulative count 10000000)

Summary:
throughput summary: 70063.34 requests per second
latency summary (msec):
avg min p50 p95 p99 max
0.185 0.072 0.183 0.247 0.335 10.279

paas@dsde05:~/sher/valkey/src$ ./valkey-cli -p 30003
127.0.0.1:30003> info memory
#Memory
used_memory:3669720
used_memory_human:3.50M
used_memory_rss:10362880
used_memory_rss_human:9.88M
used_memory_peak:4431264
used_memory_peak_human:4.23M
used_memory_peak_perc:82.81%
used_memory_overhead:3595952
used_memory_startup:2232208
used_memory_dataset:73768
used_memory_dataset_perc:5.13%
allocator_allocated:4016352
allocator_active:4325376
allocator_resident:15462400
allocator_muzzy:0
total_system_memory:405353803776
total_system_memory_human:377.52G
used_memory_lua:31744
used_memory_vm_eval:31744
used_memory_lua_human:31.00K
used_memory_scripts_eval:0
number_of_cached_scripts:0
number_of_functions:0
number_of_libraries:0
used_memory_vm_functions:33792
used_memory_vm_total:65536
used_memory_vm_total_human:64.00K
used_memory_functions:184
used_memory_scripts:184
used_memory_scripts_human:184B
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.08
allocator_frag_bytes:309024
allocator_rss_ratio:3.57
allocator_rss_bytes:11137024
rss_overhead_ratio:0.67
rss_overhead_bytes:-5099520
mem_fragmentation_ratio:2.82
mem_fragmentation_bytes:6693360
mem_not_counted_for_evict:15008
mem_replication_backlog:1048592
mem_total_replication_buffers:1066208
mem_clients_slaves:17632
mem_clients_normal:22400
mem_cluster_links:10720
mem_aof_buffer:1536
mem_allocator: jemalloc-5.3.0
mem_overhead_db_hashtable_rehashing:0
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0
127.0.0.1:30003>

mimalloc logs:

paas@dsde05:~/sher/valkey/src$ ./valkey-benchmark -t set -n 10000000 -c 20 -p 30003
====== SET ======
10000000 requests completed in 127.12 seconds
20 parallel clients
3 bytes payload
keep alive: 1
host configuration "save": 3600 1 300 100 60 10000
host configuration "appendonly": yes
multi-thread: no

Latency by percentile distribution:
0.000% <= 0.047 milliseconds (cumulative count 1)
50.000% <= 0.175 milliseconds (cumulative count 5194078)
75.000% <= 0.207 milliseconds (cumulative count 7538394)
87.500% <= 0.231 milliseconds (cumulative count 9072429)
93.750% <= 0.239 milliseconds (cumulative count 9411489)
96.875% <= 0.255 milliseconds (cumulative count 9712046)
98.438% <= 0.271 milliseconds (cumulative count 9856645)
99.219% <= 0.351 milliseconds (cumulative count 9922487)
99.609% <= 0.455 milliseconds (cumulative count 9961749)
99.805% <= 0.519 milliseconds (cumulative count 9981356)
99.902% <= 0.575 milliseconds (cumulative count 9990483)
99.951% <= 0.647 milliseconds (cumulative count 9995220)
99.976% <= 0.735 milliseconds (cumulative count 9997581)
99.988% <= 0.847 milliseconds (cumulative count 9998830)
99.994% <= 1.055 milliseconds (cumulative count 9999398)
99.997% <= 7.727 milliseconds (cumulative count 9999697)
99.998% <= 7.919 milliseconds (cumulative count 9999848)
99.999% <= 9.487 milliseconds (cumulative count 9999925)
100.000% <= 9.599 milliseconds (cumulative count 9999962)
100.000% <= 17.551 milliseconds (cumulative count 9999981)
100.000% <= 17.647 milliseconds (cumulative count 9999991)
100.000% <= 17.711 milliseconds (cumulative count 9999996)
100.000% <= 17.759 milliseconds (cumulative count 9999998)
100.000% <= 17.775 milliseconds (cumulative count 9999999)
100.000% <= 17.791 milliseconds (cumulative count 10000000)
100.000% <= 17.791 milliseconds (cumulative count 10000000)

Cumulative distribution of latencies:
2.994% <= 0.103 milliseconds (cumulative count 299447)
75.384% <= 0.207 milliseconds (cumulative count 7538394)
99.028% <= 0.303 milliseconds (cumulative count 9902814)
99.436% <= 0.407 milliseconds (cumulative count 9943610)
99.774% <= 0.503 milliseconds (cumulative count 9977426)
99.932% <= 0.607 milliseconds (cumulative count 9993172)
99.969% <= 0.703 milliseconds (cumulative count 9996941)
99.985% <= 0.807 milliseconds (cumulative count 9998491)
99.991% <= 0.903 milliseconds (cumulative count 9999072)
99.993% <= 1.007 milliseconds (cumulative count 9999311)
99.994% <= 1.103 milliseconds (cumulative count 9999441)
99.995% <= 1.207 milliseconds (cumulative count 9999478)
99.995% <= 1.303 milliseconds (cumulative count 9999512)
99.995% <= 1.407 milliseconds (cumulative count 9999546)
99.996% <= 1.503 milliseconds (cumulative count 9999577)
99.996% <= 1.607 milliseconds (cumulative count 9999595)
99.996% <= 1.703 milliseconds (cumulative count 9999616)
99.996% <= 1.807 milliseconds (cumulative count 9999631)
99.996% <= 1.903 milliseconds (cumulative count 9999639)
99.996% <= 2.007 milliseconds (cumulative count 9999640)
99.999% <= 8.103 milliseconds (cumulative count 9999880)
100.000% <= 10.103 milliseconds (cumulative count 9999980)
100.000% <= 18.111 milliseconds (cumulative count 10000000)

Summary:
throughput summary: 78664.59 requests per second
latency summary (msec):
avg min p50 p95 p99 max
0.176 0.040 0.175 0.247 0.303 17.791

paas@dsde05:~/sher/valkey/src$ ./valkey-cli -p 30003
127.0.0.1:30003> info memory
#Memory
used_memory:3686128
used_memory_human:3.52M
used_memory_rss:8605696
used_memory_rss_human:8.21M
used_memory_peak:4449360
used_memory_peak_human:4.24M
used_memory_peak_perc:82.85%
used_memory_overhead:3611528
used_memory_startup:2248664
used_memory_dataset:74600
used_memory_dataset_perc:5.19%
allocator_allocated:3685904
allocator_active:8573952
allocator_resident:8573952
allocator_muzzy:0
total_system_memory:405353803776
total_system_memory_human:377.52G
used_memory_lua:31744
used_memory_vm_eval:31744
used_memory_lua_human:31.00K
used_memory_scripts_eval:0
number_of_cached_scripts:0
number_of_functions:0
number_of_libraries:0
used_memory_vm_functions:33792
used_memory_vm_total:65536
used_memory_vm_total_human:64.00K
used_memory_functions:200
used_memory_scripts:200
used_memory_scripts_human:200B
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.00
allocator_frag_bytes:0
allocator_rss_ratio:1.00
allocator_rss_bytes:0
rss_overhead_ratio:1.00
rss_overhead_bytes:31744
mem_fragmentation_ratio:2.33
mem_fragmentation_bytes:4919792
mem_not_counted_for_evict:14112
mem_replication_backlog:1048592
mem_total_replication_buffers:1066208
mem_clients_slaves:17632
mem_clients_normal:22400
mem_cluster_links:10720
mem_aof_buffer:640
mem_allocator:mimalloc-1.8.5
mem_overhead_db_hashtable_rehashing:0
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0
127.0.0.1:30003>

Copy link
Member

@PingXie PingXie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will also need to update the README.md file as well and include the install instructions for mimalloc.

.gitmodules Outdated Show resolved Hide resolved
src/zmalloc.h Outdated
#elif defined(USE_MIMALLOC)
#define ZMALLOC_LIB ("mimalloc-" __xstr(MI_VERSION_MAJOR) "." __xstr(MI_VERSION_MINOR) "." __xstr(MI_VERSION_PATCH))
#include <mimalloc.h>
#if (MI_VERSION_MAJOR == 1 && MI_VERSION_MINOR >= 8) || (MI_VERSION_MAJOR > 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic doesn't seem right to me. This is checking the hard-coded version numbers from L37-39. Shouldn't we check the version number defined in <mimalloc.h>?

#define MI_MALLOC_VERSION 185   // major + 2 digits minor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we sould use the MI_MALLOC_VERSION to check the version rather than hardcode value. Updated in zmalloc.h:)

Copy link
Collaborator

@hpatro hpatro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • What happens with fragmentation on mimalloc allocator?
  • Should we introduce a defrag_supported field to INFO command? It would be handy for users to figure out if the server supports it or not. Currently I think setting CONFIG SET ACTIVEDEFRAG YES would have no effect on the server.

src/zmalloc.h Outdated Show resolved Hide resolved
@WM0323
Copy link
Contributor Author

WM0323 commented Apr 29, 2024

  • What happens with fragmentation on mimalloc allocator?

Mimalloc efficiently manages fragmentation through strategies like segment reuse and object migration. However, unlike jemalloc, it does not support manual defragmentation commands.

@WM0323
Copy link
Contributor Author

WM0323 commented Apr 29, 2024

  • Should we introduce a defrag_supported field to INFO command? It would be handy for users to figure out if the server supports it or not. Currently I think setting CONFIG SET ACTIVEDEFRAG YES would have no effect on the server.

Sure, It would clearly indicate whether active defragmentation is supported, preventing confusion. I will add this to INFO command

Sher Sun added 2 commits April 29, 2024 16:01
fix
Signed-off-by: Sher Sun <[email protected]>
@WM0323 WM0323 requested review from PingXie and hpatro April 29, 2024 17:08
@hpatro
Copy link
Collaborator

hpatro commented Apr 29, 2024

  • What happens with fragmentation on mimalloc allocator?

Mimalloc efficiently manages fragmentation through strategies like segment reuse and object migration. However, unlike jemalloc, it does not support manual defragmentation commands.

Came across this discussion on mimalloc microsoft/mimalloc#632. Looks like Dragonfly folks have made some changes by vendoring mimalloc and improving the utilization. @PingXie What are your thoughts?

src/zmalloc.h Outdated
@@ -47,6 +48,7 @@

#elif defined(USE_JEMALLOC)
#define ZMALLOC_LIB ("jemalloc-" __xstr(JEMALLOC_VERSION_MAJOR) "." __xstr(JEMALLOC_VERSION_MINOR) "." __xstr(JEMALLOC_VERSION_BUGFIX))
#define defrag_supported "yes"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather simply compare the allocator, if it's JEMALLOC print yes else no for now. Won't require us to add another macro.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good suggestion:) I will update in server.c file: directly check which memory allocator is being used, avoids introducing new macros.

@@ -5757,7 +5757,8 @@ sds genRedisInfoString(dict *section_dict, int all_sections, int everything) {
"mem_overhead_db_hashtable_rehashing:%zu\r\n", mh->overhead_db_hashtable_rehashing,
"active_defrag_running:%d\r\n", server.active_defrag_running,
"lazyfree_pending_objects:%zu\r\n", lazyfreeGetPendingObjectsCount(),
"lazyfreed_objects:%zu\r\n", lazyfreeGetFreedObjectsCount()));
"lazyfreed_objects:%zu\r\n", lazyfreeGetFreedObjectsCount(),
"defrag_supported: %s\r\n", defrag_supported));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a tcl test to verify the behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in other.tcl file.

@PingXie
Copy link
Member

PingXie commented Apr 30, 2024

  • What happens with fragmentation on mimalloc allocator?

Mimalloc efficiently manages fragmentation through strategies like segment reuse and object migration. However, unlike jemalloc, it does not support manual defragmentation commands.

Came across this discussion on mimalloc microsoft/mimalloc#632. Looks like Dragonfly folks have made some changes by vendoring mimalloc and improving the utilization. @PingXie What are your thoughts?

This is essentially the same argument for vendoring jemalloc (#364) in the first place.

I think the key question is whether we believe this is going to be a common case such that the value of vendoring outweighs its overhead. I am sure fragmentation can be an issue for certain workloads but are we looking at 0.0.1%, 0.1%, 1%, or 10%? Also how bad a more generic defrag solution would be? Does there exist a generic solution that works for all allocators and gives us say 90% of "effectiveness", or 80%, 70%, etc?

Vendoring itself is a concern to me and vendoring another dependency to the already long list of vendored dependencies is another concern to me. Our default stance should be "no vendoring". Exceptions are inevitable but there needs to be a very compelling argument IMO.

Sher Sun added 5 commits April 30, 2024 17:18
fix
Signed-off-by: Sher Sun <[email protected]>
Signed-off-by: Sher Sun <[email protected]>
fix
Signed-off-by: Sher Sun <[email protected]>
Signed-off-by: Sher Sun <[email protected]>
@WM0323 WM0323 force-pushed the mimalloc-support branch from f135314 to 2466cb4 Compare April 30, 2024 20:42
@yairgott
Copy link

yairgott commented May 1, 2024

@WM0323 , I'm curious to get your thoughts on what maybe some the reasons that the rust community reached the opposite conclusion - jemalloc is faster and consumes less memory than mimalloc. See discussion

To clarify, not suggesting that this change shouldn't be accepted. Maybe there is something that we can learn around conducting such tests.

@yairgott
Copy link

yairgott commented May 1, 2024

Some thoughts around improving the benchmark signal:

  1. Consider incorporating SET commands with variant payload size.
  2. The benchmark should exercise free more often. Consider issuing a mix of DEL and SET commands.
  3. Consider looking at the real time spent in the allocators. Overall benchmark number results may provide less consistent result as those are subject to the host noise. This is also aligned with the way the Rust community did their evaluation.

@hpatro
Copy link
Collaborator

hpatro commented May 1, 2024

Thanks for your input @yairgott . This change just allows users to use mimalloc but doesn't add it as a default. I think we should be fine supporting it.
I agree with the benchmark request, the workload requested above would be more closer to real world scenario and how it would handle fragmentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants