Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split out graph Expr code from Dataframe Expr code #470

Merged
merged 4 commits into from
Dec 8, 2023

Conversation

mrocklin
Copy link
Member

@mrocklin mrocklin commented Dec 5, 2023

Closes #142
Supercedes #158

This is the smallest step I could think of to start this change. I may want to play around a little with arrays this week in anticipation of hanging out with @dcherian next week. I figured it would be good to split this off now if possible.

(sorry for the poor timing, I know it's an active week for this repository)

cc @phofl

@mrocklin
Copy link
Member Author

mrocklin commented Dec 5, 2023

from dask_expr import _core as core, _expr as expr
everything = set(dir(object))
c = set(dir(core.Expr)) - everything
d = set(dir(expr.Expr)) - everything - c

What's in core.Expr

{'__dask_graph__',
 '__dask_keys__',
 '__dict__',
 '__module__',
 '__weakref__',
 '_defaults',
 '_depth',
 '_find_similar_operations',
 '_layer',
 '_lower',
 '_meta',
 '_name',
 '_node_label_args',
 '_parameters',
 '_remove_operations',
 '_required_attribute',
 '_simplify_down',
 '_simplify_up',
 '_task',
 '_to_graphviz',
 '_tree_repr_lines',
 'dependencies',
 'find_operations',
 'lower_completely',
 'lower_once',
 'operand',
 'pprint',
 'rewrite',
 'simplify',
 'substitute',
 'substitute_parameters',
 'tree_repr',
 'visualize',
 'walk'}

What's in expr.Expr

{'__add__',
 '__and__',
 '__bool__',
 '__getattr__',
 '__getitem__',
 '__invert__',
 '__mul__',
 '__neg__',
 '__or__',
 '__pos__',
 '__radd__',
 '__rand__',
 '__rge__',
 '__rgt__',
 '__rle__',
 '__rlt__',
 '__rmul__',
 '__ror__',
 '__rsub__',
 '__rtruediv__',
 '__rxor__',
 '__sub__',
 '__truediv__',
 '__xor__',
 '_combine_similar',
 '_combine_similar_branches',
 '_divisions',
 '_is_length_preserving',
 'abs',
 'align',
 'all',
 'any',
 'apply',
 'astype',
 'clip',
 'columns',
 'combine_first',
 'combine_similar',
 'count',
 'divisions',
 'dtypes',
 'fillna',
 'idxmax',
 'idxmin',
 'index',
 'isna',
 'isnull',
 'known_divisions',
 'mask',
 'max',
 'mean',
 'memory_usage_per_partition',
 'min',
 'mode',
 'nbytes',
 'ndim',
 'npartitions',
 'nunique_approx',
 'optimize',
 'prod',
 'rename_axis',
 'replace',
 'round',
 'size',
 'std',
 'sum',
 'to_timestamp',
 'var',
 'where'}

@mrocklin
Copy link
Member Author

mrocklin commented Dec 6, 2023

There are other things we could think about doing, like renaming _expr.Expr to Frame or something, or moving stuff within a dataframe directory, but I think that all of that could also come later and doesn't necessarily need to block this first step. My hope is that this first step is pretty innocuous.

@mrocklin mrocklin mentioned this pull request Dec 6, 2023
13 tasks
@phofl
Copy link
Collaborator

phofl commented Dec 6, 2023

Do you have an idea how we can solve the issue that defining things like addition and multiplication on the expression gets more complex if we split this up? This is why we postponed the initial pr IIRC

@mrocklin
Copy link
Member Author

mrocklin commented Dec 6, 2023

I think that we have array.Add and dataframe.Add. I intentionally didn't move the __add__ and similar methods over. If we ever mix types then we must add a ConvertToArray or ConvertToDataframe operation beforehand.

@phofl
Copy link
Collaborator

phofl commented Dec 8, 2023

Beforehand meaning before calling this on the expression level or on the expression level? I think this is better suited on the collection level generally speaking.

@mrocklin
Copy link
Member Author

mrocklin commented Dec 8, 2023

All I mean is that the answer with Add and Mul and such is not to share, at least short term. Regardless, I think that this change doesn't get into that question. I think that this is a safe change to make. Maybe we should chat live sometime today?

@phofl phofl merged commit 7009da7 into dask:main Dec 8, 2023
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove DataFrame assumptions from Expr
2 participants