Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ak.nones_like #3221

Open
lgray opened this issue Aug 19, 2024 · 2 comments
Open

ak.nones_like #3221

lgray opened this issue Aug 19, 2024 · 2 comments
Labels
feature New feature or request

Comments

@lgray
Copy link
Contributor

lgray commented Aug 19, 2024

Description of new feature

It would be very useful to have this additional *_like operator to create a column of nulled data.

This comes up in a physics use case at LHC (at cleaner experimental environments this doesn't hold) where a Jet is an object that has charge but due to QCD screening and reconstruction effects that charge is difficult or impossible to determine accurately. This means that while Jets are what physicists typically call "candidates", which are four-momenta with an assigned charge, the charge can be best described as None since any associated number is confusing or misleading.

So, to successfully reconcile types with physics concepts being able to generate None columns in the same shape as pt or other Jet quantities would be really useful, so that we can assign the None column as the candidate's charge.

@jpivarski created a few-line example of a mockup of expected behavior using awkward:

>>> some_data = ak.Array([[1, 2, 3], [], [4, 5]])

>>> with_none = ak.concatenate([ak.Array([None]), some_data])   # [None] has type ?unknown, merges well
>>> with_none
<Array [None, [1, 2, 3], [], [4, 5]] type='4 * option[var * int64]'>

>>> with_none[np.zeros(1000, np.int64)]
<Array [None, None, None, ..., None, None] type='1000 * option[var * int64]'>
@lgray lgray added the feature New feature or request label Aug 19, 2024
@agoose77
Copy link
Collaborator

@lgray ak.enforce_type can do this. The only quirk is that we can't easily build the type object from a string with unknown. I think that's an oversight @jpivarski? :)

@jpivarski
Copy link
Member

I think you can:

>>> ak.types.from_datashape("unknown", highlevel=False)
UnknownType()
>>> ak.types.from_datashape("5 * unknown")
ArrayType(UnknownType(), 5, None)

Oh, that was the wrong direction. You mean like this:

>>> ak.Array([None, None, None])
<Array [None, None, None] type='3 * ?unknown'>
>>> ak.enforce_type(ak.Array([None, None, None]), "?int32")
<Array [None, None, None] type='3 * ?int32'>
>>> ak.enforce_type(ak.Array([None, None, None]), "option[var * int32]")
<Array [None, None, None] type='3 * option[var * int32]'>

This use of enforce_type is a good idea! Given some_array,

>>> some_array = ak.Array([[1, 2, 3], [], [4, 5]])
>>> some_type = ak.types.OptionType(some_array.type.content)
>>> ak.enforce_type(ak.Array([None]), some_type)[np.zeros(100000, int)]
<Array [None, None, None, ..., None, None] type='100000 * option[var * int64]'>

Even though the above could be one line, it would be useful to have it wrapped up as a function because ak.none_like(some_array) is a lot more obvious about what it does.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants