Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement "reserve" and improve "Extend" impl #28

Merged
merged 12 commits into from
Jan 26, 2020

Conversation

soruh
Copy link
Contributor

@soruh soruh commented Jan 24, 2020

This Draft somewhat naively follows the suggestions made at #10 to implement reserve and improve the Extend implementation.
I'll continue to work on this and update this PR when I have made more progress

I have so far:

  • naively ported the tryPresize function
  • implemented a reserve wrapper around it (which, like most of the public API, takes a Guard for now)
  • improved the Extend implementation to use reserve and the hashbrown::HashMap::extend heuristic on how much to reserve
  • written some tests, to confirm everything works. (I had to implement capacity to make sure a resize actually happened`
  • written more tests
  • fixed the resizeStamp function
  • refactored the load factor calculation

What still needs to be done:

  • review the code
    - possibly relax some Orderings (everything is SeqConst for now) (Should probably get it's own PR)

Implemented reserve and improved used it to improve extend
wrote a `capacity` function, to allow for testsing resizes
wrote some tests
@soruh soruh changed the title implement reserve and capacity and improve Extend impl implement "reserve" and "capacity" and improve "Extend" impl Jan 24, 2020
@jonhoo
Copy link
Owner

jonhoo commented Jan 24, 2020

Awesome! I'll take a look through and leave comments.

src/map.rs Outdated Show resolved Hide resolved
src/map.rs Outdated Show resolved Hide resolved
src/map.rs Outdated Show resolved Hide resolved
src/map.rs Outdated Show resolved Hide resolved
src/map.rs Outdated Show resolved Hide resolved
src/map.rs Outdated Show resolved Hide resolved
src/map.rs Outdated Show resolved Hide resolved
Copy link
Contributor Author

@soruh soruh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonhoo Thank you for the review!
I'll work on this some more tomorrow, but I should probably call it quits for today.

PS: The pipeline seems to fail (partially) due to not being able to install rust.
It might make sense to take a look at that.

src/map.rs Outdated Show resolved Hide resolved
src/map.rs Outdated Show resolved Hide resolved
src/map.rs Outdated Show resolved Hide resolved
src/map.rs Outdated Show resolved Hide resolved
@soruh
Copy link
Contributor Author

soruh commented Jan 25, 2020

I am a bit confused about why the java version multiplies the requested capacity
by 1.5 and adds 1 before rounding it to the next power of two:

tableSizeFor(size + (size >>> 1) + 1)

and why it set's the size control to 3/4 of the current capacity if the table was not yet initialized:

sc = n - (n >>> 2);

It obviously is the right thing to do since the java version does it this way, but it seems really weird to me...

@jonhoo
Copy link
Owner

jonhoo commented Jan 25, 2020

I am a bit confused about why the java version multiplies the requested capacity
by 1.5 and adds 1 before rounding it to the next power of two

I'm not sure. It may have to do with the way that their tableSizeFor function is implemented, where you get rounded to the nearest power of 2 rather than the next power of two?

Why it set's the size control to 3/4 of the current capacity if the table was not yet initialized

size_ctl is set to 3/4 of the new capacity if we end up allocating. Think of this as "when the next resize should happen". ConcurrentHashMap specifically targets a 0.75 load factor, and resizes if it goes above that so that no single bin gets too crowded.

@soruh soruh closed this Jan 25, 2020
@soruh soruh reopened this Jan 25, 2020
@soruh
Copy link
Contributor Author

soruh commented Jan 25, 2020

Ah, I was confused about the exact meaning of size_ctl, guess I'll look at the rest of my code, to make sure I didn't make any wrong comments because of that.

@soruh
Copy link
Contributor Author

soruh commented Jan 25, 2020

since if size_ctl is 0 it means the inital capacity should be DEFAULT_CAPACITY shouldn't this line:

n = (sc > c) ? sc : c;

take that into account?

@soruh
Copy link
Contributor Author

soruh commented Jan 25, 2020

I'm thinking of something like this:

let initial_capacity = if size_ctl == 0 {
    DEFAULT_CAPACITY as isize
} else {
    size_ctl
};

// the new capacity is either the requested capacity or the initial capacity, if it is larger
// since all maps should be at least as big as their initial capacity specifies
let new_capacity = requested_capacity.max(initial_capacity) as usize;

@jonhoo
Copy link
Owner

jonhoo commented Jan 25, 2020

Well, the initial capacity for a table is determined here:

flurry/src/map.rs

Lines 313 to 317 in 051ca79

let n = if sc > 0 {
sc as usize
} else {
DEFAULT_CAPACITY
};

DEFAULT_CAPACITY shouldn't be used in reserve, because there the user already provides us with a target size.

Or maybe I'm missing something about your question?

@soruh
Copy link
Contributor Author

soruh commented Jan 25, 2020

We'll if I used a map and initialized it with an initial capacity I would assume that it has at least that capacity...

Nevermind if size_ctl == 0 the user didn't actutally specify an initial capacity...

soruh added 2 commits January 25, 2020 17:10
following code audit feedback
and documented `try_presize` better
@jonhoo
Copy link
Owner

jonhoo commented Jan 25, 2020

Ohhh, I see what you're getting at:

let map = flurry::HashMap::with_capacity(1024);
map.reserve(512);

The call to with_capacity just sets size_ctl, but it does not allocate anything. So when we get to reserve, there will be no bins, but size_ctl != 0. I think your approach of taking the max is probably the right one. It also makes me wonder if we should call .reserve from with_capacity automatically so that we actually allocate when the user uses with_capacity.

@jonhoo
Copy link
Owner

jonhoo commented Jan 25, 2020

b83cbe2 looks great

@soruh
Copy link
Contributor Author

soruh commented Jan 25, 2020

I think calling try_presize from with_capacity would be the way to go, but I realized, that the DEFAULT_CAPACITY isn't really needed.
size_ctl is only 0 if the user didn't specify a capacity and in that case, since the user doesn't care what the initial capacity was, we can just use the requested capacity...

@soruh
Copy link
Contributor Author

soruh commented Jan 25, 2020

The question is how we handle the guard for try_presize in with_capacity. I think we can just use an unprotected, since we are creating the map and no one else can access it, but I'm not completely certain.

@soruh
Copy link
Contributor Author

soruh commented Jan 25, 2020

Is there a reason you were dividing by the load factor:

let size = (1.0 + (n as f64) / LOAD_FACTOR) as usize;

instead of multiplying by it:

let size = (1.0 + (n as f64) * LOAD_FACTOR) as usize;

or was that just a typo?

soruh and others added 3 commits January 25, 2020 19:18
reworked how the load factor is applied
make some tests unit tests
make `capacity` private
@soruh
Copy link
Contributor Author

soruh commented Jan 25, 2020

I initially wanted to make a new branch for this but impulsively committed this here, should I revert the last commit and make a new branch, or can I just leave it?

@soruh soruh changed the title implement "reserve" and "capacity" and improve "Extend" impl implement "reserve" and improve "Extend" impl Jan 25, 2020
@soruh soruh marked this pull request as ready for review January 25, 2020 21:15
@jonhoo
Copy link
Owner

jonhoo commented Jan 26, 2020

Well, DEFAULT_CAPACITY is the value you should allocate if the user does not specify a capacity. That is, if neither with_capacity nor reserve is called.

It should be fine to use unprotected as the guard in with_capacity.

I divide because that's what the Java code does.

I'm fine with you folding the fix for the resize stamp in here :)

@jonhoo
Copy link
Owner

jonhoo commented Jan 26, 2020

This is looking really good. Are there any other changes you want to make before merging?

@soruh
Copy link
Contributor Author

soruh commented Jan 26, 2020

Well, DEFAULT_CAPACITY is the value you should allocate if the user does not specify a capacity. That is, if neither with_capacity nor reserve is called.

But we don't need it in try_presize, since we only call it when the user has called with_capacity or reserve

I divide because that's what the Java code does.

That's really weird, especially since it adds a global constant just for that...
As we call try_presize in with_capacity now, we don't have that line anymore, since try_presize manages the load factor; I don't think this will be a problem though.

I'm fine with you folding the fix for the resize stamp in here :)

Great, thank you

@soruh
Copy link
Contributor Author

soruh commented Jan 26, 2020

This is looking really good. Are there any other changes you want to make before merging?

Not really, I initially wanted to look into maybe relaxing some Orderings, but I think it might be better to do that in a new PR, since we also need to do that in other parts of the code...
( And thank you :) )

@jonhoo
Copy link
Owner

jonhoo commented Jan 26, 2020

But we don't need it in try_presize, since we only call it when the user has called with_capacity or reserve

Right, hence my comment above:

DEFAULT_CAPACITY shouldn't be used in reserve, because there the user already provides us with a target size.


I initially wanted to look into maybe relaxing some Orderings, but I think it might be better to do that in a new PR, since we also need to do that in other parts of the code...

Yes, I think that belongs in a new PR!

Okay, let's merge this 🎉

@jonhoo jonhoo merged commit 068b291 into jonhoo:master Jan 26, 2020
@soruh
Copy link
Contributor Author

soruh commented Jan 26, 2020

But we don't need it in try_presize, since we only call it when the user has called with_capacity or reserve

Right, hence my comment above:

DEFAULT_CAPACITY shouldn't be used in reserve, because there the user already provides us with a target size.

Whoops, guess I missed that, sorry.

Okay, let's merge this 🎉

Great, Thank you!

@jonhoo
Copy link
Owner

jonhoo commented Jan 26, 2020

No, thank you!

@soruh soruh deleted the implement-reserve branch January 29, 2020 12:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants