-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes for dask-awkward one-pass optimize #1225
base: main
Are you sure you want to change the base?
Changes for dask-awkward one-pass optimize #1225
Conversation
src/uproot/_dask.py
Outdated
@@ -954,6 +954,7 @@ def __call__(self, form: Form) -> tuple[Form, TrivialFormMappingInfo]: | |||
class UprootReadMixin: | |||
base_form: Form | |||
expected_form: Form | |||
behavior = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how behaviours should be handled, so this probably needs updating
@@ -889,7 +888,9 @@ def keys_for_buffer_keys(self, buffer_keys: frozenset[str]) -> frozenset[str]: | |||
keys: set[str] = set() | |||
for buffer_key in buffer_keys: | |||
# Identify form key | |||
form_key, attribute = buffer_key.rsplit("-", maxsplit=1) | |||
form_key, attribute = buffer_key.replace("@.", "<root>.").rsplit( | |||
"-", maxsplit=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this fail for any of the styles of buffer name? We could use [0]
or *attribute
to guard against this (we don't actually use the attribute).
In fact, this whole loop could be a single comprehension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unclear what characters are not allowed in a TBranch name, which gets passed over to the form_keys. It would be hard to use a :
in a TBranch name, since that gets interpreted by ROOT (in some constructors, not all). On the other hand, .
is very common and can't be a delimiter.
I tried to see what would break it:
root [0] TTree* t = new TTree("name", "title")
(TTree *) 0x6360d13c1720
root [1] (new TBranch(t, "something", nullptr, "leaf"))->GetName()
(const char *) "something"
root [2] (new TBranch(t, "some.thing", nullptr, "leaf"))->GetName()
(const char *) "some.thing"
root [3] (new TBranch(t, "some:thing", nullptr, "leaf"))->GetName()
(const char *) "some:thing"
root [4] (new TBranch(t, "some/thing", nullptr, "leaf"))->GetName()
(const char *) "some/thing"
root [5] (new TBranch(t, "some\"thing", nullptr, "leaf"))->GetName()
(const char *) "some"thing"
root [6] (new TBranch(t, "some`thing", nullptr, "leaf"))->GetName()
(const char *) "some`thing"
root [7] (new TBranch(t, "some!thing", nullptr, "leaf"))->GetName()
(const char *) "some!thing"
root [8] (new TBranch(t, "some@thing", nullptr, "leaf"))->GetName()
(const char *) "some@thing"
root [9] (new TBranch(t, "some#thing", nullptr, "leaf"))->GetName()
(const char *) "some#thing"
root [10] (new TBranch(t, "some+thing", nullptr, "leaf"))->GetName()
(const char *) "some+thing"
root [11] (new TBranch(t, "some(thing", nullptr, "leaf"))->GetName()
(const char *) "some(thing"
root [12] (new TBranch(t, "some{thing", nullptr, "leaf"))->GetName()
(const char *) "some{thing"
root [13] (new TBranch(t, "some[thing", nullptr, "leaf"))->GetName()
(const char *) "some[thing"
root [14] (new TBranch(t, "some\\thing", nullptr, "leaf"))->GetName()
(const char *) "some\thing"
root [15] (new TBranch(t, "some|thing", nullptr, "leaf"))->GetName()
(const char *) "some|thing"
root [16] (new TBranch(t, "some;thing", nullptr, "leaf"))->GetName()
(const char *) "some;thing"
root [17] (new TBranch(t, "some\0thing", nullptr, "leaf"))->GetName()
(const char *) "some"
Here's something we can count on: a ROOT TBranch name is not going to have a null byte (Python "\x00"
) in the middle of the string, whereas this is legal and would be properly carried around in Python. It seems pretty drastic, though.
…ant/uproot5 into one-pass-dask-optimize
…ant/uproot5 into one-pass-dask-optimize
|
||
def project(self: T, *, report: TypeTracerReport, state: dict) -> T: | ||
keys = self.necessary_columns(report=report, state=state) | ||
keys = [_buf_to_col(c).replace(".", "_") for c in columns] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lgray : this is the simplistic way to unmap the column names. This line for the simple case, the if-block for nanoAOD, adding in implicit columns like nJet. Then, this doesn't actually make use of the schema and keys_for_buffer_keys
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Reminder to @lgray to take a look at this.)
@@ -889,7 +888,9 @@ def keys_for_buffer_keys(self, buffer_keys: frozenset[str]) -> frozenset[str]: | |||
keys: set[str] = set() | |||
for buffer_key in buffer_keys: | |||
# Identify form key | |||
form_key, attribute = buffer_key.rsplit("-", maxsplit=1) | |||
form_key, attribute = buffer_key.replace("@.", "<root>.").rsplit( | |||
"-", maxsplit=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unclear what characters are not allowed in a TBranch name, which gets passed over to the form_keys. It would be hard to use a :
in a TBranch name, since that gets interpreted by ROOT (in some constructors, not all). On the other hand, .
is very common and can't be a delimiter.
I tried to see what would break it:
root [0] TTree* t = new TTree("name", "title")
(TTree *) 0x6360d13c1720
root [1] (new TBranch(t, "something", nullptr, "leaf"))->GetName()
(const char *) "something"
root [2] (new TBranch(t, "some.thing", nullptr, "leaf"))->GetName()
(const char *) "some.thing"
root [3] (new TBranch(t, "some:thing", nullptr, "leaf"))->GetName()
(const char *) "some:thing"
root [4] (new TBranch(t, "some/thing", nullptr, "leaf"))->GetName()
(const char *) "some/thing"
root [5] (new TBranch(t, "some\"thing", nullptr, "leaf"))->GetName()
(const char *) "some"thing"
root [6] (new TBranch(t, "some`thing", nullptr, "leaf"))->GetName()
(const char *) "some`thing"
root [7] (new TBranch(t, "some!thing", nullptr, "leaf"))->GetName()
(const char *) "some!thing"
root [8] (new TBranch(t, "some@thing", nullptr, "leaf"))->GetName()
(const char *) "some@thing"
root [9] (new TBranch(t, "some#thing", nullptr, "leaf"))->GetName()
(const char *) "some#thing"
root [10] (new TBranch(t, "some+thing", nullptr, "leaf"))->GetName()
(const char *) "some+thing"
root [11] (new TBranch(t, "some(thing", nullptr, "leaf"))->GetName()
(const char *) "some(thing"
root [12] (new TBranch(t, "some{thing", nullptr, "leaf"))->GetName()
(const char *) "some{thing"
root [13] (new TBranch(t, "some[thing", nullptr, "leaf"))->GetName()
(const char *) "some[thing"
root [14] (new TBranch(t, "some\\thing", nullptr, "leaf"))->GetName()
(const char *) "some\thing"
root [15] (new TBranch(t, "some|thing", nullptr, "leaf"))->GetName()
(const char *) "some|thing"
root [16] (new TBranch(t, "some;thing", nullptr, "leaf"))->GetName()
(const char *) "some;thing"
root [17] (new TBranch(t, "some\0thing", nullptr, "leaf"))->GetName()
(const char *) "some"
Here's something we can count on: a ROOT TBranch name is not going to have a null byte (Python "\x00"
) in the middle of the string, whereas this is legal and would be properly carried around in Python. It seems pretty drastic, though.
|
||
def project(self: T, *, report: TypeTracerReport, state: dict) -> T: | ||
keys = self.necessary_columns(report=report, state=state) | ||
keys = [_buf_to_col(c).replace(".", "_") for c in columns] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Reminder to @lgray to take a look at this.)
This makes the basic uproot tests in dask-contrib/dask-awkward#491 pass
cc @lgray