BUG: Support win_type parameter with groupby.rolling #30559

ghost · 2019-12-30T14:53:57Z

It appears that rolling aggregations on groupby objects do not behave as expected. It looks like the win_type parameter is ignored. Here's a minimal example:

Code Sample

In this example, a rolling mean is calculated with "uniform" weights and also with "blackman" weights. In each case, the mean is calculated on the ungrouped data and the grouped data, which should give the same result as there is only one group.

import numpy as np
import pandas as pd

df = pd.DataFrame({'groups': ['g']*10,
                   'data': np.sin(np.arange(10))})

groups = df[['data', 'groups']].groupby('groups')

# Rolling mean with uniform weights
df['uniform'] = df['data'].rolling(4, win_type=None).mean()
df['grp_uniform'] = groups.rolling(4, win_type=None).mean().values

# Rolling mean with a e.g. blackman weights
df['blackman'] = df['data'].rolling(4, win_type='blackman').mean()
df['grp_blackman'] = groups.rolling(4, win_type='blackman').mean().values

# The results should be equal, whether on grouped or ungrouped data
assert df['uniform'].equals(df['grp_uniform'])     # This passes
assert df['blackman'].equals(df['grp_blackman'])   # This fails
print(df)

Problem description

The last assert statement fails: the rolling aggregation on the groupby object does not work as expected.

In that last calculation, it looks like the win_type='blackman' parameter was simply ignored, and the calculation was done with uniform weights.

Expected Output

Here is the resulting dataframe from the calculations above. The last two columns ought to be identical (but are not):

  groups      data   uniform  grp_uniform  blackman  grp_blackman
0      g  0.000000       NaN          NaN       NaN           NaN
1      g  0.841471       NaN          NaN       NaN           NaN
2      g  0.909297       NaN          NaN       NaN           NaN
3      g  0.141120  0.472972     0.472972  0.875384      0.472972
4      g -0.756802  0.283771     0.283771  0.525209      0.283771
5      g -0.958924 -0.166327    -0.166327 -0.307841     -0.166327
6      g -0.279415 -0.463506    -0.463506 -0.857863     -0.463506
7      g  0.656987 -0.334539    -0.334539 -0.619170     -0.334539
8      g  0.989358  0.102001     0.102001  0.188786      0.102001
9      g  0.412118  0.444762     0.444762  0.823172      0.444762

This is possibly related to similar issues #26597 and #26462.

Here's my best effort at diagnosing what is going on under the hood... it looks like the pandas.core.window.RollingGroupby class inherits the mean() method from the Rolling class, and hence completely ignores the win_type parameter.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None
python : 3.7.3.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 0.25.1
numpy : 1.16.4
pytz : 2018.9
dateutil : 2.8.0
pip : 19.0.3
setuptools : 40.8.0
Cython : 0.29.6
pytest : 4.3.1
hypothesis : None
sphinx : 1.8.5
blosc : None
feather : None
xlsxwriter : 1.1.5
lxml.etree : 4.3.2
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10
IPython : 7.4.0
pandas_datareader: None
bs4 : 4.7.1
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.3.2
matplotlib : 3.0.3
numexpr : 2.6.9
odfpy : None
openpyxl : 2.6.1
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.2.1
sqlalchemy : 1.3.1
tables : 3.5.1
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.1.5

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2020-01-03T21:48:53Z

Thanks for the report. This does look buggy. Further investigation into why this is happening / a PR to fix it would be most welcome!

ghost · 2020-01-04T01:31:54Z

Hi @TomAugspurger ,

Thanks for the response! I've tried to find out what's happening with some debugging, and I think I've found the cause. I haven't been able to make a fix myself yet as it requires some refactoring of classes that requires some discussion, but perhaps I'll be able to do it with some pointers from yourself or any others more experienced with the code base than myself.

It looks like there are three classes of interest, all in core/window/rolling.py:

_Window has most window-related functionality, but not including weights
Window inherits from _Window and adds in functionality for weighted rolling means
RollingGroupby inherits from _Window and WindowGroupbyMixin but does not inherit from Window. So, it misses out the required weighted rolling mean functionality.

Here is the weighted rolling mean method which is implemented in the Window class:

pandas/pandas/core/window/rolling.py

Lines 1137 to 1143 in c1b8573

    
           def mean(self, *args, **kwargs): 
        
               nv.validate_window_func("mean", args, kwargs) 
        
               window_func = self._get_roll_func("roll_weighted_mean") 
        
               window_func = self._get_weighted_roll_func(window_func, _use_window) 
        
               return self._apply( 
        
                   window_func, center=self.center, is_weighted=True, name="mean", **kwargs 
        
               )

I tried copying this method to the RollingGroupby class, but encountered an issue: it looks like the _apply method is overridden by WindowGroupbyMixin, and so the snippet above does not work. Here is WindowGroupbyMixing._apply():

pandas/pandas/core/window/common.py

Lines 65 to 91 in bd15c59

    
               def _apply( 
        
                   self, 
        
                   func: Callable, 
        
                   center: bool, 
        
                   require_min_periods: int = 0, 
        
                   floor: int = 1, 
        
                   is_weighted: bool = False, 
        
                   name: Optional[str] = None, 
        
                   use_numba_cache: bool = False, 
        
                   **kwargs, 
        
               ): 
        
                   """ 
        
                   Dispatch to apply; we are stripping all of the _apply kwargs and 
        
                   performing the original function call on the grouped object. 
        
                   """ 
        
                   kwargs.pop("floor", None) 
        
                   # TODO: can we de-duplicate with _dispatch? 
        
                   def f(x, name=name, *args): 
        
                       x = self._shallow_copy(x) 
        
                       if isinstance(name, str): 
        
                           return getattr(x, name)(*args, **kwargs) 
        
                       return x.apply(name, *args, **kwargs) 
        
                   return self._groupby.apply(f)

Any advice or ideas on the philosophy behind these classes and where everything should go would be greatly appreciated!

ghost · 2020-01-04T01:39:45Z

It looks like @mroeschke and @jbrockmendel are the authors of the WindowGroupbyMixin class, perhaps they could advise on how we can fix this issue? Any tips greatly appreciated, eager to help if I can.

mroeschke · 2020-01-04T03:30:29Z

Thanks for investigating @Connossor. win_type is essentially not supported with groupby rolling right now.

As you noticed the Window class essentially exclusively contains the weighted aggregation code while the Rolling class contains the non weighted aggregation code (and the mixin only using the Rolling class).

Soon I'm going to be experimenting on unifying these two (which would fix this issue), but it will unlikely be fixed in the next release unfortunately.

johndmccann93 · 2023-06-28T02:28:23Z

Any ETA on a solution for this or, in the meantime, a workaround?

lisrael1 · 2023-07-11T15:41:14Z

issue still persists at version 2.0.3.
the rectangle window is very noisy, the output of the window is not smooth.
the win_type support is needed...
thanks

as a workaround, gpt 3.5 wrote me this code:

import pandas as pd
import numpy as np

# Create a sample DataFrame
df = pd.DataFrame({
    'category': ['A', 'A', 'A', 'B', 'B', 'B'],
    'cost': [1, 2, 3, 4, 5, 6]
})

# Define the rolling window size and win_type
window_size = 10
win_type = 'hamming'

# Iterate over each unique category
for category in df['category'].unique():
    # Filter the DataFrame for the current category
    category_df = df[df['category'] == category]
    
    # Calculate the rolling mean for the current category
    rolling_mean = category_df['cost'].rolling(window_size, win_type=win_type).mean()
    
    # Update the DataFrame with the rolling mean values
    df.loc[df['category'] == category, 'cost_roll'] = rolling_mean

print(df)

instead of this:

df['cost_roll'] = df.groupby('category', as_index=False).cost.rolling(10, win_type='hamming').mean().cost

SaundersR1999 · 2023-08-14T04:40:13Z

For loops are usable for smaller/less complex use cases but performance could really suffer if win_type is needed in cases where groupby would be used to group by two or more columns. Shame this has not been fixed yet.

jbrockmendel · 2023-08-15T14:51:37Z

Shame this has not been fixed yet.

You're welcome to submit a PR. Comments like this are counter-productive.

francoispierrepaty · 2023-10-31T14:21:07Z

A workaround is:

df.groupby("GROUP").apply(
        lambda group: group["VALUE"]
        .rolling(window=window, min_periods=min_periods, center=center, win_type='hamming')
        .mean()
    )

TomAugspurger added the Window rolling, ewma, expanding label Jan 3, 2020

TomAugspurger added this to the Contributions Welcome milestone Jan 3, 2020

mroeschke added the Enhancement label May 8, 2020

mroeschke changed the title ~~BUG: Rolling aggregations on grouped data ignore win_type parameter~~ BUG: Support win_type parameter with groupby.rolling Oct 5, 2020

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Support win_type parameter with groupby.rolling #30559

BUG: Support win_type parameter with groupby.rolling #30559

ghost commented Dec 30, 2019

INSTALLED VERSIONS

TomAugspurger commented Jan 3, 2020

ghost commented Jan 4, 2020 •

edited by ghost

Loading

ghost commented Jan 4, 2020

mroeschke commented Jan 4, 2020

johndmccann93 commented Jun 28, 2023

lisrael1 commented Jul 11, 2023

SaundersR1999 commented Aug 14, 2023

jbrockmendel commented Aug 15, 2023

francoispierrepaty commented Oct 31, 2023

BUG: Support win_type parameter with groupby.rolling #30559

BUG: Support win_type parameter with groupby.rolling #30559

Comments

ghost commented Dec 30, 2019

Code Sample

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

TomAugspurger commented Jan 3, 2020

ghost commented Jan 4, 2020 • edited by ghost Loading

ghost commented Jan 4, 2020

mroeschke commented Jan 4, 2020

johndmccann93 commented Jun 28, 2023

lisrael1 commented Jul 11, 2023

SaundersR1999 commented Aug 14, 2023

jbrockmendel commented Aug 15, 2023

francoispierrepaty commented Oct 31, 2023

Output of `pd.show_versions()`

ghost commented Jan 4, 2020 •

edited by ghost

Loading