-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add Series|Expr.rolling_var
and Series|Expr.rolling_std
#1451
base: main
Are you sure you want to change the base?
Conversation
@@ -446,3 +446,39 @@ def _parse_time_format(arr: pa.Array) -> str: | |||
if pc.all(matches.is_valid()).as_py(): | |||
return time_fmt | |||
return "" | |||
|
|||
|
|||
def pad_series( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly a better name to be found
Two main comments:
|
s = pd.Series(values) | ||
window_size = random.randint(2, len(s)) # noqa: S311 | ||
min_periods = random.randint(2, window_size) # noqa: S311 | ||
ddof = random.randint(0, min_periods - 1) # noqa: S311 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, it took me a bit, but I figured out that the different behavior appear when ddof > min_periods
because the denominator can be negative or zero.
Pandas seems to return NaN indistinctly for windows that do not have enough non null elements and these cases, while polars returns inf (and same in pyarrow since we can control that)
>>> agnostic_rolling_std(df_pl) | ||
shape: (4, 2) | ||
ββββββββ¬βββββββββββ | ||
β a β b β | ||
β --- β --- β | ||
β f64 β f64 β | ||
ββββββββͺβββββββββββ‘ | ||
β 1.0 β 0.0 β | ||
β 2.0 β 0.707107 β | ||
β null β 0.707107 β | ||
β 4.0 β 1.414214 β | ||
ββββββββ΄βββββββββββ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
π€ is this a bug in Polars?
In [37]: pl.Series([1,2,None,4]).rolling_std(3, min_periods=1)
Out[37]:
shape: (4,)
Series: '' [f64]
[
0.0
0.707107
0.707107
1.414214
]
In [38]: pl.Series([1,2,1,4]).rolling_std(3, min_periods=1)
Out[38]:
shape: (4,)
Series: '' [f64]
[
null
0.707107
0.57735
1.527525
]
I think the first element of the result should be null, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh true! It definitly doesn't look right! Worth reporting upstream I imagine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on it pola-rs/polars#20077
What type of PR is this? (check all applicable)
Related issues
Checklist
If you have comments or can explain your changes, please do so below
Opening as draft, I still need to figure out how to deal with null/nan's to match polars behavior.
Edit: Scary size diff, yet 25%+ is just the copying of methods to the stable api, and another good 10% are docstrings examples