Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace statsmodels' AutoReg with OLS? #483

Open
mathause opened this issue Jul 25, 2024 · 3 comments
Open

replace statsmodels' AutoReg with OLS? #483

mathause opened this issue Jul 25, 2024 · 3 comments

Comments

@mathause
Copy link
Member

mathause commented Jul 25, 2024

Estimating auto regression params is just an ordinary least regression in disguise (AFAIK). At least as long we don't do any fancy stuff and restrict ourselves to the standard AR(p) processes (no seasonal terms, estimating all the terms, ...). Switching from AutoReg to a linear regression solver can speed up the estimation (or more likely the data preparation) considerably (more than 10x). Of course we have to double and triple check the results. (Also I am a bit afraid that, AutoReg checks for some edge chases we don't know about and that we will need to bells and whistles as soon as we rewrite it, but the speed gain could be worth it).

See similarly #290 and #472 (comment)

import numpy as np
import scipy as sp
from sklearn.linear_model import LinearRegression
from statsmodels.regression.linear_model import OLS
from statsmodels.tsa.ar_model import AutoReg

arr = np.random.randn(100)

for i in range(1, arr.size):
    arr[i] = arr[i] + 0.7 * arr[i -1]

def lin_func(x, a, b):
    return a * x + b


%timeit AutoReg(arr, lags=1).fit()
%timeit LinearRegression(fit_intercept=True).fit(X=arr[:-1].reshape(-1, 1), y=arr[1:])
%timeit sp.optimize.curve_fit(lin_func, arr[:-1], arr[1:])[0]
%timeit OLS(arr[1:], arr[:-1]).fit()
%timeit sp.stats.linregress(arr[:-1], arr[1:])
1.27 ms ± 66.1 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
391 μs ± 14.1 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
212 μs ± 4.86 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
172 μs ± 850 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
79 μs ± 3.6 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Also, for lag > 1:

X = np.lib.stride_tricks.sliding_window_view(arr, 2)[:, ::-1][:, 1:]

edit: added OLS

@veni-vidi-vici-dormivi
Copy link
Collaborator

Note also statsmodels OLS is in between curve_fitand linregress:

from statsmodels.regression.linear_model import OLS
%timeit OLS(arr[1:], arr[:-1]).fit()
51.3 µs ± 266 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

@veni-vidi-vici-dormivi
Copy link
Collaborator

note to self: can also be done for cycle-stationary AR process.

@mathause
Copy link
Member Author

Thanks - I added OLS above (interesting how your laptop is about 3-4 times faster; but it's also newer).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants