-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Support win_type parameter with groupby.rolling #30559
Comments
Thanks for the report. This does look buggy. Further investigation into why this is happening / a PR to fix it would be most welcome! |
Hi @TomAugspurger , Thanks for the response! I've tried to find out what's happening with some debugging, and I think I've found the cause. I haven't been able to make a fix myself yet as it requires some refactoring of classes that requires some discussion, but perhaps I'll be able to do it with some pointers from yourself or any others more experienced with the code base than myself. It looks like there are three classes of interest, all in
Here is the weighted rolling mean method which is implemented in the pandas/pandas/core/window/rolling.py Lines 1137 to 1143 in c1b8573
I tried copying this method to the RollingGroupby class, but encountered an issue: it looks like the pandas/pandas/core/window/common.py Lines 65 to 91 in bd15c59
Any advice or ideas on the philosophy behind these classes and where everything should go would be greatly appreciated! |
It looks like @mroeschke and @jbrockmendel are the authors of the WindowGroupbyMixin class, perhaps they could advise on how we can fix this issue? Any tips greatly appreciated, eager to help if I can. |
Thanks for investigating @Connossor. As you noticed the Soon I'm going to be experimenting on unifying these two (which would fix this issue), but it will unlikely be fixed in the next release unfortunately. |
Any ETA on a solution for this or, in the meantime, a workaround? |
issue still persists at version 2.0.3. as a workaround, gpt 3.5 wrote me this code: import pandas as pd
import numpy as np
# Create a sample DataFrame
df = pd.DataFrame({
'category': ['A', 'A', 'A', 'B', 'B', 'B'],
'cost': [1, 2, 3, 4, 5, 6]
})
# Define the rolling window size and win_type
window_size = 10
win_type = 'hamming'
# Iterate over each unique category
for category in df['category'].unique():
# Filter the DataFrame for the current category
category_df = df[df['category'] == category]
# Calculate the rolling mean for the current category
rolling_mean = category_df['cost'].rolling(window_size, win_type=win_type).mean()
# Update the DataFrame with the rolling mean values
df.loc[df['category'] == category, 'cost_roll'] = rolling_mean
print(df) instead of this: df['cost_roll'] = df.groupby('category', as_index=False).cost.rolling(10, win_type='hamming').mean().cost |
For loops are usable for smaller/less complex use cases but performance could really suffer if win_type is needed in cases where groupby would be used to group by two or more columns. Shame this has not been fixed yet. |
You're welcome to submit a PR. Comments like this are counter-productive. |
A workaround is: df.groupby("GROUP").apply(
lambda group: group["VALUE"]
.rolling(window=window, min_periods=min_periods, center=center, win_type='hamming')
.mean()
) |
It appears that rolling aggregations on groupby objects do not behave as expected. It looks like the
win_type
parameter is ignored. Here's a minimal example:Code Sample
In this example, a rolling mean is calculated with "uniform" weights and also with "blackman" weights. In each case, the mean is calculated on the ungrouped data and the grouped data, which should give the same result as there is only one group.
Problem description
The last assert statement fails: the rolling aggregation on the groupby object does not work as expected.
In that last calculation, it looks like the win_type='blackman' parameter was simply ignored, and the calculation was done with uniform weights.
Expected Output
Here is the resulting dataframe from the calculations above. The last two columns ought to be identical (but are not):
This is possibly related to similar issues #26597 and #26462.
Here's my best effort at diagnosing what is going on under the hood... it looks like the
pandas.core.window.RollingGroupby
class inherits themean()
method from theRolling
class, and hence completely ignores the win_type parameter.Output of
pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.3.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
pandas : 0.25.1
numpy : 1.16.4
pytz : 2018.9
dateutil : 2.8.0
pip : 19.0.3
setuptools : 40.8.0
Cython : 0.29.6
pytest : 4.3.1
hypothesis : None
sphinx : 1.8.5
blosc : None
feather : None
xlsxwriter : 1.1.5
lxml.etree : 4.3.2
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10
IPython : 7.4.0
pandas_datareader: None
bs4 : 4.7.1
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.3.2
matplotlib : 3.0.3
numexpr : 2.6.9
odfpy : None
openpyxl : 2.6.1
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.2.1
sqlalchemy : 1.3.1
tables : 3.5.1
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.1.5
The text was updated successfully, but these errors were encountered: