Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

code usege of sds (and maybe other data types) #739

Open
zvi-code opened this issue Jul 2, 2024 · 1 comment
Open

code usege of sds (and maybe other data types) #739

zvi-code opened this issue Jul 2, 2024 · 1 comment

Comments

@zvi-code
Copy link
Contributor

zvi-code commented Jul 2, 2024

Originally posted by @zvi-code in #541 (comment)

This change makes sense to me overall and tradeoffs are good.

My concern is with that conceptually it's not a good idea that we have 2 conflicting patterns:

  1. use the internal encoding information of sds [pointer is odd]

  2. embed sds into some other non-native allocation buffer, potentially at arbitrary offset

I feel this has very big potential for very hard to track bugs

Trying to arrange my thoughts on this. The above is just an example for a wider issue In the code design. IMO there are two types of 'sds' usages:

  1. A way to carry metadata about a string buffer through IO flow
  2. A compact way to encode string buffer metadata in memory with good locality and low memory overhead

It is clear to me why use 'sds' for 2. For 1 I feel it could be wrong choice and this choice also has performance costs as it forces memory access to unpack the info many times in IO flow. For example sdsfree access the memory on free, even though it could have been avoided. This has high costs when memory access is a factor.
I also think there is an alternative, to take an approach slightly similar to rust, have some struct 'runtimeSds' with sds as member and metadata, need to check if it can be passed to functions by value, to avoid the need to allocate it on the heap. Thoughts?

@zvi-code zvi-code changed the title > This change makes sense to me overall and tradeoffs are good. Should we rethink how we use sds (and maybe other data types) ? Jul 2, 2024
@zvi-code zvi-code changed the title Should we rethink how we use sds (and maybe other data types) ? code uses of sds (and maybe other data types) ? Jul 2, 2024
@zvi-code zvi-code changed the title code uses of sds (and maybe other data types) ? code uses of sds (and maybe other data types) Jul 2, 2024
@zvi-code zvi-code changed the title code uses of sds (and maybe other data types) code usege of sds (and maybe other data types) Jul 2, 2024
@hpatro
Copy link
Collaborator

hpatro commented Jul 9, 2024

@zvi-code Thanks for starting this issue. Right now I'm a bit unclear about your suggestion. I would like to understand more :).

Also not sure if you're referring to the same thing which I also had experimented a little while ago. Let me know if we are on the same page :).

Overall, my idea was not to create robj/sds during the IO flow for each argument. Hence, an additional copy isn't required and was instead maintaining a seqbuf struct to refer the data in client's querybuf present already.

typedef struct seqbuf {
   char *buf;
   uint32_t len;
} seqbuf;

Few of the helper methods to handle to-fro between data storage layer and network layer:

seqbuf *seqbuf_init_cstr(char *buf, long len);
seqbuf *seqbuf_init_sds(sds buf);
void seqbuf_clonedata(seqbuf *sb);
sds seqbuf_createsds(seqbuf *sb);
struct redisObject *seqbuf_createStringObject(seqbuf *sb);
void seqbuf_clear(seqbuf *sb);
void seqbuf_free(seqbuf *sb);

One of the challenges which I observed with introducing another type in to the Valkey codebase is the developers need to be aware when to use which type and the initial change would be quite messy as there a lot of touchpoints. Currently, it's pretty homogenous throughout the codebase so easier to navigate albeit paying some of the cost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants