-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TB Optimization] Skip subtrees based on the subtree's root node's permissions #4008
[TB Optimization] Skip subtrees based on the subtree's root node's permissions #4008
Conversation
That is at least potentially confusing. :/ But maybe the fix here should be on the diagnostic side, not the core algorithm. Does this depend on when the GC runs, or is it deterministic?
It seems like most benchmarks are unchanged by this PR (compared to just #4006), only a few of them benefit. big-allocs gets slightly worse. Do you have evidence that this is beneficial on (a non-trivial fraction of) real-world code? |
Indeed. Intuitively, if you use lots of shared references, you benefit.
True, but I'd say that this is within measurement imprecision. That test just allocates a lot, without ever touching the memory. |
It depends. Arguably, the fact that it's because there's a frozen parent could be more clear than the fact that it's because you are reserved conflicted protected. But note that there's nothing the diagnostics can do to do things differently here, because the child node would not never become conflicted with this.
It is deterministic. |
// of `ReservedIM`, `Disabled`, or a not-yet-accessed "lazy" permission thing. | ||
// The two former are already invariant under all foreign accesses, and for | ||
// the latter it does not really matter, since they can not be used/initialized | ||
// due to having a protected parent. So this only affects diagnostics, but the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be "disabled parent", right?
@@ -185,6 +185,30 @@ impl LocationState { | |||
// need to be applied to this subtree. | |||
_ => false, | |||
}; | |||
if self.permission.is_disabled() { | |||
// A foreign access to a `Disabled` tag will have almost no observable effect. | |||
// It's a theorem that `Disabled` node have no protected initialized children, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not an obvious theorem -- can you give a brief argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's proven in Coq 😛.
The reason it holds is that to become disabled, you need to have a foreign write access happen. But that would have triggered any protected initialized nodes that are children of the node being disabled. And you can't have a new child of Disabled become initialized, because that would mean the to-be-initialized node has a child access, which is however blocked by the Disabled parent.
// It's a theorem that `Disabled` node have no protected initialized children, | ||
// and so this foreign access will never trigger any protector. | ||
// Further, the children will never be able to read or write again, since they | ||
// have a `Disabled` parents. Even further, all children of `Disabled` are one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The argument could end here, right? The permissions below don't matter since anyway no access is possible.
73383fd
to
ca6a1aa
Compare
Please rebase over master again so that github updates the diff for this PR. |
On it. |
ca6a1aa
to
d92de1a
Compare
@rustbot ready |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have to carefully look at the test diff still to make sure I understand what happens here.
// A foreign access to a `Disabled` tag will have almost no observable effect. | ||
// It's a theorem that `Disabled` node have no protected initialized children, | ||
// and so this foreign access will never trigger any protector. | ||
// (Intuition: You're either protected initialized, and thus can't become |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"can't become" what? Missing a word.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't become Disabled. The word is there now.
Yeah okay makes sense, thanks for the detailed comments!
After fixing the typos, please squash this to 2 commits (leave the first that adds the test separate).
|
@JoJoDeveloping please also update the PR description, at least the last paragraph is not up-to-date any more. @rustbot author |
…ss would mostly be a NOP
0608150
to
c387247
Compare
@rustbot ready |
In #4006, we re-added the functionality for skipping subtrees. It turns out that just skipping subtrees based on their last recorded access is imprecise. In certain cases, we know we can skip subtrees purely based on the root's current permission, without having to track the last access. Specifically:
Note that this PR loosens the notion of "invariant" a bit. For example, it is possible that there is a
Reserved
protected node that is a child of aFrozen
node. When that undergoes a foreign read, it becomes conflicted. If we skip accessing that subtree, it no longer does become conflicted.The reason this is still OK is that the only effect of this conflictedness is blocking child write accesses. But such accesses are already blocked by the
Frozen
node further up the tree. So no UB is missed, all that happens is that diagnostics are triggered at a different node.For more detailed analysis of why this is correct, see the in-code comments.
Here is a performance analysis, comparing this PR's improvements with that of #4006:
As in #4006, this is a log graph. The blue line shows performance without #4006, red is with the re-added optimization of #4006, yellow is this PR (which is stacked on top of #4006), and green is just the changes proposed here, but with the "latest foreign access tracking" machinery of #4006 removed. As can be seen, having both combined gives the greatest performance.