-
-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arrays #471
Arrays #471
Conversation
Depends on dask/dask#10676 Also, no rush on this PR. I think that this is likely to just be a working branch for a while. I would like to get #470 in though if we can before that goes stale (which is likely to happen quickly I suspect). |
OK, I went to add blockwise, but needed unify_chunks, which needed rechunk. I've added rechunk (including our first optimization!) |
OK, this does trivial things now: In [1]: import numpy as np, dask_expr.array as da
In [2]: x = da.from_array(np.random.random((1000, 1000)))
In [3]: y = da.from_array(np.random.random((1000)))
In [4]: z = (x + y).rechunk((500, 200))
In [5]: z.pprint()
Rechunk: _chunks=(500, 200) balance=False
Elemwise: func=<built-in function add> out_ind=(1, 0) token='add' dtype=dtype('float64') new_axes={} kwargs={}(1, 0)(0,)
FromArray: array='<array>' chunks='auto'
FromArray: array='<array>' chunks='auto'
In [6]: z.optimize().pprint()
Elemwise: func=<built-in function add> out_ind=(1, 0) token='add' dtype=dtype('float64') new_axes={} kwargs={}(1, 0)(0,)
FromArray: array='<array>' chunks=(500, 200)
FromArray: array='<array>' chunks=(200,)
In [7]: x.T.T.pprint()
Transpose: axes=(1, 0)
Transpose: axes=(1, 0)
FromArray: array='<array>' chunks='auto'
In [8]: x.T.T.optimize().pprint()
FromArray: array='<array>' chunks='auto' cc @dcherian @TomNicholas @jhamman Still very broken, but hopefully enough of a prop for conversation at AGU |
|
||
x = (xr.DataArray(b, dims=["x", "y"]) + 1).chunk(x=2) | ||
|
||
assert x.data.optimize()._name == (da.from_array(a, chunks={0: 2}) + 1)._name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @dcherian the xarray thing we were playing with works now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woot!
OK, I've implemented rudimentary versions of blockwise, slicing, rechunking, reductions, random, and from_array. The opening comment has been updated. This should suffice for POCs. I hope to be able to nerd snipe someone to flesh out this skeleton. |
@fjetter @phofl we'll need to figure out how/if we should review this and eventually merge. Priority is certainly dataframes, and I don't want to upset the momentum there. I'd also like to avoid this sitting stale for a long time. Or maybe that's best. Better for things to go stale in a PR maybe rather than go stale in main. |
Mind adding an automatic |
I've added support for general numpy ufuncs (not gufuncs) and filled out the reductions and operators a little |
we merged the rebase |
Woot. Thanks all
…On Tue, Jun 25, 2024 at 3:48 PM Patrick Hoefler ***@***.***> wrote:
we merged the rebase
—
Reply to this email directly, view it on GitHub
<#471 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTBHWBZIKGOZ6PPBIIDZJHCPVAVCNFSM6AAAAABAISOWPCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBZHA2DMNRXHE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Exciting! |
@dcherian just to be clear, this implementation is broken and would produce wrong results in many cases. Please do not use today. This is still very much a work in progress. |
👍🏾 excited to see movement that's all |
This implements a skeleton version of dask.array. It currently includes the following:
But there are many missing gaps. Some notabe omissions:
add
are built andsub
is not.sum
In general though I think that the majority (60%?) of the very hard work is done.
Examples