-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removed dask pinning #570
Removed dask pinning #570
Conversation
for more information, see https://pre-commit.ci
|
Edit. Additional comments:
|
|
Let's wait to see if there is a solution upstream for the two open points above; I think they would fix most/all the tests. |
Thanks @LucaMarconato ! I wasn't aware of dask-expr removing the |
In I am trying working on a PR to restore |
|
|
for more information, see https://pre-commit.ci
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #570 +/- ##
==========================================
- Coverage 91.94% 91.91% -0.04%
==========================================
Files 44 44
Lines 6608 6641 +33
==========================================
+ Hits 6076 6104 +28
- Misses 532 537 +5
|
Finally this PR is ready for review! Could any of you have a look please @giovp @melonora @kevinyamauchi? This PR removes the pin for the A few important comments:
|
Further comments:
|
My posts below are not needed for the review of this PR
Now some details regarding the behavior with
I still thought that one could have easily stored the The class if I initialize a Dask dataframe calling The implication of this is that when I tried a bunch of experiments, here are some of them: changes in the The tests that I created pass (they should live in |
Here is some code that shows the problem (requires the In a few weeks I will try to make a shorter code example, independent from import pandas as pd
from dask.dataframe import from_pandas
from dask.array import from_array
from spatialdata.transformations import Identity, get_transformation
from spatialdata import transform, SpatialData
from xarray import DataArray
import dask.array as da
from dask.dataframe import DataFrame as DaskDataFrame
from spatialdata.models import PointsModel
df = pd.DataFrame({"x": [1, 2, 3, 4, 1, 2], "y": [1, 2, 3, 4, 1, 2]})
ddf = from_pandas(df, npartitions=1)
ddf.attrs['transform'] = {'transformed': Identity()}
axes = ['x', 'y']
sdata = SpatialData.init_from_elements({'ddf': ddf})
sdata_transformed = sdata.transform_to_coordinate_system('transformed')
arrays = []
for ax in axes:
arrays.append(ddf[ax].to_dask_array(lengths=True).reshape(-1, 1))
xdata = DataArray(da.concatenate(arrays, axis=1), coords={"points": range(len(ddf)), "dim": list(axes)})
transformed = ddf.drop(columns=list(axes)).copy()
# the weird part starts here
transformed.attrs = {'transform': {'global': Identity()}}
for ax in axes:
indices = xdata["dim"] == ax
new_ax = xdata[:, indices]
transformed[ax] = new_ax.data.flatten()
if 'global' not in transformed.attrs['transform']:
print('bug') |
Finally here is a sketch of what I think could be a solution:
I think that if the point 1 is addressed, the rest should be manageable. Curious to hear your comments on this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great @LucaMarconato thanks! just a question on setting default, and whether it's ok to raise the warning every time the library is imported
@@ -1,5 +1,17 @@ | |||
from __future__ import annotations | |||
|
|||
import dask | |||
|
|||
dask.config.set({"dataframe.query-planning": False}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get it, it's set to false here, but then once dask.dataframe is imported, is set again to True?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not obvious, I'll add a comment to make it clear. What can happen is that the user imports dask.dataframe
before importing this file/setting dataframe.query-planning
to False
. In that case DASK_EXPR_ENABLED
would be True
, but we don't want that. So we add this extra check.
from spatialdata.models._utils import TRANSFORM_KEY | ||
|
||
if TRANSFORM_KEY in e.attrs: | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a great check btw, I think it can close this issue: #576
for more information, see https://pre-commit.ci
A comment on my latest commits. I don't particularly like the fact that in
While I don't like the fact that it's difficult to test all these cases since they depend on the specific versions installed, and also that it's possible that these patterns will change over time when new versions of dask are released. I have tried adding some tests for I therefore propose the following:
Nevertheless, the currently implemented behavior is safe and ready to merge. |
Removing the dask pinning from
pyproject.toml
.This PR also tests against Python 3.12.