Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyse and implement Dense representation for PFADD #446

Open
lucifercr07 opened this issue Sep 3, 2024 · 15 comments · May be fixed by #1342
Open

Analyse and implement Dense representation for PFADD #446

lucifercr07 opened this issue Sep 3, 2024 · 15 comments · May be fixed by #1342

Comments

@lucifercr07
Copy link
Contributor

  • Benchmark testing for current Sparse representation as part of PFADD.
  • Analyse feasibility of switch between Sparse and Dense representations based on benchmark testing.
  • Implementation to switch to Dense representation for HLL once a threshold is reached i.e #define HLL_SPARSE_VAL_MAX_VALUE 32 for Redis we can evaluate based on benchmark numbers when to promote.
  • Reference from Redis code here
@lucifercr07 lucifercr07 changed the title Analyse and implement Dense representation for PFADD Analyse and implement Dense representation for PFADD Sep 3, 2024
@chettriyuvraj
Copy link

I have no clue so might require a bit of to-fro in terms of questions - but I'd love to take this up!

@lucifercr07
Copy link
Contributor Author

Assigned, @chettriyuvraj thanks for picking this up, please let me know if you've any queries around this.

@evoxtorm
Copy link

Hey @lucifercr07

So, I was looking into this issue so that I can also help @chettriyuvraj a bit, while reading the code I stumbled upon some things.

switch(hdr->encoding) {
    case HLL_DENSE: return hllDenseAdd(hdr->registers,ele,elesize);
    case HLL_SPARSE: return hllSparseAdd(o,ele,elesize);

So for the above code in Redis file the sparse and dense implementation can be chosen in the hllAdd function based on the encoding type and the other is inside the hllSparseAdd(o,ele,elesize); which is based on the threshold value, So here are we implementing both of these methods or we are going with the threshold based on the size.

One other thing is the way it calculates the string size, so in the Redis, they have sdslen() so do we also have to do something like that to get the size of strings in args?

Thanks

@chettriyuvraj
Copy link

Sorry for no updates on this one - was down with H1N1 the past week. Will pick up and put out an update @lucifercr07 .

@arpitbbhayani
Copy link
Contributor

Hello @chettriyuvraj,

There has been no activity on this issue for the past 5 days.
It would be awesome if you keep posting updates to this issue so that we know you are actively working on it.

We are really eager to close this issue at the earliest, hence if we continue to see the inactivity, we will have to reassign the issue to someone else. We are doing this to ensure that the project maintains its momentum and others are not blocked on this work.

Just drop a comment with the current status of the work or share any issues you are facing. We can always chip in to help you out.

Thanks again.

@chettriyuvraj
Copy link

Hi @arpitbbhayani!

Today was the first day I actively picked up the issue. I'll be posting an update daily from now on.

Status

I mentioned that I had no clue about what HyperLogLog was and my first step today was to pick up the paper.

I'll hopefully have a bit more concrete updates + queries to ask tomorrow.

Apologies about the unreasonable delays and for slowing things down on this issue - I know how important keeping the momentum in this project is. Please bear with me - I'll get this over the line!

@chettriyuvraj
Copy link

Hey @lucifercr07!

A little unsure if my understanding is correct here

My understanding

In Redis source, this seems to be handling the promotion from sparse to dense representations.

Promotion occurs in 2 cases:
Case 1. Using #define HLL_SPARSE_VAL_MAX_VALUE 32, where the value exceeds that which can be represented by a sparse register.
Case 2. A configurable threshold of bytes i.e. server.hll_sparse_max_bytes. If the storage consumed by sparse representation exceeds this - promotion occurs.

Queries

  • Promotion in case 1 -> shouldn't this case already be handled by the library we are using? I had a peek (just a glimpse) on the library's source and it is promoting sparse to dense inside its Insert(..) function, need to take a closer look

  • So the main objective of the issue is to determine the threshold for case 2?

  • Analyse feasibility of switch between Sparse and Dense representations based on benchmark testing - can you elaborate on how benchmarking will help us decide this?

@arpitbbhayani
Copy link
Contributor

Hello @chettriyuvraj,

There has been no activity on this issue for the past 5 days.
It would be awesome if you keep posting updates to this issue so that we know you are actively working on it.

We are really eager to close this issue at the earliest, hence if we continue to see the inactivity, we will have to reassign the issue to someone else. We are doing this to ensure that the project maintains its momentum and others are not blocked on this work.

Just drop a comment with the current status of the work or share any issues you are facing. We can always chip in to help you out.

Thanks again.

2 similar comments
@arpitbbhayani
Copy link
Contributor

Hello @chettriyuvraj,

There has been no activity on this issue for the past 5 days.
It would be awesome if you keep posting updates to this issue so that we know you are actively working on it.

We are really eager to close this issue at the earliest, hence if we continue to see the inactivity, we will have to reassign the issue to someone else. We are doing this to ensure that the project maintains its momentum and others are not blocked on this work.

Just drop a comment with the current status of the work or share any issues you are facing. We can always chip in to help you out.

Thanks again.

@arpitbbhayani
Copy link
Contributor

Hello @chettriyuvraj,

There has been no activity on this issue for the past 5 days.
It would be awesome if you keep posting updates to this issue so that we know you are actively working on it.

We are really eager to close this issue at the earliest, hence if we continue to see the inactivity, we will have to reassign the issue to someone else. We are doing this to ensure that the project maintains its momentum and others are not blocked on this work.

Just drop a comment with the current status of the work or share any issues you are facing. We can always chip in to help you out.

Thanks again.

@chettriyuvraj
Copy link

Hi @arpitbbhayani - have unassigned myself from this issue. Free to assign it to someone else.

@chettriyuvraj chettriyuvraj removed their assignment Nov 2, 2024
@the-code-innovator
Copy link
Contributor

Hi @arpitbbhayani / @JyotinderSingh,
Can you assign this to me?

@JyotinderSingh
Copy link
Collaborator

Hi @arpitbbhayani / @JyotinderSingh,

Can you assign this to me?

Assigned

@the-code-innovator
Copy link
Contributor

I made some progress by working on top of suggestions by @evoxtorm and I am working on understanding it and implementing the Dense algorithm.

@the-code-innovator
Copy link
Contributor

the-code-innovator commented Nov 27, 2024

Hi @arpitbbhayani, @lucifercr07,
The benchmarks present a significant jump in the time taken per op, if the number of items in the list is greater than 1000.

Benchmarks with the current Sparse Representation in PFADD

goos: darwin
goarch: arm64
pkg: github.com/dicedb/dice/internal/eval
cpu: Apple M1 Pro
BenchmarkEvalPFADD/PFAddSize_0-8         	 3737166	       317.1 ns/op	      80 B/op	       3 allocs/op
BenchmarkEvalPFADD/PFAddSize_10-8        	  628004	      1892 ns/op	     665 B/op	      22 allocs/op
BenchmarkEvalPFADD/PFAddSize_100-8       	   64923	     18592 ns/op	    4850 B/op	     125 allocs/op
BenchmarkEvalPFADD/PFAddSize_1000-8      	    3874	    303253 ns/op	   68278 B/op	    1177 allocs/op
BenchmarkEvalPFADD/PFAddSize_10000-8     	    1576	    757984 ns/op	  160080 B/op	   10003 allocs/op
BenchmarkEvalPFADD/PFAddSize_100000-8    	     306	   3911595 ns/op	 1600081 B/op	  100003 allocs/op
PASS
ok  	github.com/dicedb/dice/internal/eval	8.562s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants