Skip to content

Commit

Permalink
Add support for Neuralforecast (georgia-tech-db#1115)
Browse files Browse the repository at this point in the history
Adding support for `neuralforecast`. Fixes georgia-tech-db#1112.

```sql
DROP TABLE IF EXISTS AirData;

CREATE TABLE AirData (
    unique_id TEXT(30),
    ds TEXT(30),
    y INTEGER);

LOAD CSV 'data/forecasting/air-passengers.csv' INTO AirData;

DROP FUNCTION IF EXISTS Forecast;

CREATE FUNCTION Forecast FROM
(SELECT unique_id, ds, y FROM AirData)
TYPE Forecasting
PREDICT 'y'
HORIZON 12
LIBRARY 'neuralforecast';

SELECT Forecast(12);
```
One quick issue here is that `neuralforecast` needs `horizon` as a
parameter while training, unlike `statsforecast`. Thus, a better way to
call the UDF would be simply `SELECT Forecast();`, which is currently
unsupported. @xzdandy Please let me know your thoughts.

List of stuff yet to be done:

- [x] Incorporate `neuralforecast`
- [x] Fix `HORIZON` redundancy (UPDATE: Being fixed in georgia-tech-db#1121)
- [x] Reuse model with lower horizon no
- [x] Add support for ~multivariate forecasting~ exogenous variables
- [x] Add tests
- [x] Add docs

---------

Co-authored-by: xzdandy <[email protected]>
  • Loading branch information
2 people authored and a0x8o committed Oct 30, 2023
1 parent efdfee9 commit 8eeef95
Show file tree
Hide file tree
Showing 6 changed files with 120 additions and 0 deletions.
26 changes: 26 additions & 0 deletions docs/source/reference/ai/model-forecasting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -120,11 +120,19 @@ EvaDB's default forecast framework is `statsforecast <https://nixtla.github.io/s
* - LIBRARY (str, default: 'statsforecast')
- We can select one of `statsforecast` (default) or `neuralforecast`. `statsforecast` provides access to statistical forecasting methods, while `neuralforecast` gives access to deep-learning based forecasting methods.
* - MODEL (str, default: 'ARIMA')
<<<<<<< HEAD
- If LIBRARY is `statsforecast`, we can select one of ARIMA, ting, ETS, Theta. The default is ARIMA. Check `Automatic Forecasting <https://nixtla.github.io/statsforecast/src/core/models_intro.html#automatic-forecasting>`_ to learn details about these models. If LIBRARY is `neuralforecast`, we can select one of NHITS or NBEATS. The default is NBEATS. Check `NBEATS docs <https://nixtla.github.io/neuralforecast/models.nbeats.html>`_ for details.
* - AUTO (str, default: 'T')
- If set to 'T', it enables automatic hyperparameter optimization. Must be set to 'T' for `statsforecast` library. One may set this parameter to `false` if LIBRARY is `neuralforecast` for faster (but less reliable) results.
* - Frequency (str, default: 'auto')
- A string indicating the frequency of the data. The common used ones are D, W, M, Y, which respectively represents day-, week-, month- and year- end frequency. The default value is M. Check `pandas available frequencies <https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases>`_ for all available frequencies. If it is not provided, the frequency is attempted to be determined automatically.
=======
- If LIBRARY is `statsforecast`, we can select one of ARIMA, CES, ETS, Theta. The default is ARIMA. Check `Automatic Forecasting <https://nixtla.github.io/statsforecast/src/core/models_intro.html#automatic-forecasting>`_ to learn details about these models. If LIBRARY is `neuralforecast`, we can select one of NHITS or NBEATS. The default is NBEATS. Check `NBEATS docs <https://nixtla.github.io/neuralforecast/models.nbeats.html>`_ for details.
* - AUTO (str, default: 'T')
- If set to 'T', it enables automatic hyperparameter optimization. Must be set to 'T' for `statsforecast` library. One may set this parameter to `false` if LIBRARY is `neuralforecast` for faster (but less reliable) results.
* - Frequency (str, default: 'auto')
- A string indicating the frequency of the data. The common used ones are D, W, M, Y, which repestively represents day-, week-, month- and year- end frequency. The default value is M. Check `pandas available frequencies <https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases>`_ for all available frequencies. If it is not provided, the frequency is attempted to be determined automatically.
>>>>>>> ca239aea (Add support for Neuralforecast (#1115))

Note: If columns other than the ones required as mentioned above are passed while creating the function, they will be treated as exogenous variables if LIBRARY is `neuralforecast`. Otherwise, they would be ignored.

Expand All @@ -141,6 +149,7 @@ Below is an example query specifying the above parameters:
ID 'type'
Frequency 'W';
<<<<<<< HEAD
<<<<<<< HEAD

Below is an example query with `neuralforecast` with `trend` column as exogenous and without automatic hyperparameter optimization:

Expand All @@ -156,6 +165,8 @@ Below is an example query with `neuralforecast` with `trend` column as exogenous
FREQUENCY 'M';
=======
>>>>>>> 53dafecf (feat: sync master staging (#1050))
=======
>>>>>>> ca239aea (Add support for Neuralforecast (#1115))
=======
SELECT Forecast(12) FROM AirData;

Expand Down Expand Up @@ -210,6 +221,8 @@ Below is an example query specifying the above parameters:
TIME 'saledate'
ID 'type'
Frequency 'W';
=======
>>>>>>> e8a181c5 (Add support for Neuralforecast (#1115))

Below is an example query with `neuralforecast` with `trend` column as exogenous and without automatic hyperparameter optimization:

Expand All @@ -222,10 +235,23 @@ Below is an example query with `neuralforecast` with `trend` column as exogenous
PREDICT 'y'
LIBRARY 'neuralforecast'
AUTO 'f'
<<<<<<< HEAD
FREQUENCY 'M';
>>>>>>> 40a10ce1 (Bump v0.3.4+ dev)
<<<<<<< HEAD
=======
>>>>>>> 53dafecf (feat: sync master staging (#1050))
=======
>>>>>>> 6d6a14c8 (Bump v0.3.4+ dev)
=======
<<<<<<< HEAD
<<<<<<< HEAD
FREQUENCY 'M';
=======
FREQUENCY 'M';
>>>>>>> 40a10ce1 (Bump v0.3.4+ dev)
>>>>>>> eva-master
=======
FREQUENCY 'M';
>>>>>>> e8a181c5 (Add support for Neuralforecast (#1115))
>>>>>>> ca239aea (Add support for Neuralforecast (#1115))
21 changes: 21 additions & 0 deletions evadb/executor/create_function_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,9 +78,12 @@
try_to_import_ludwig,
try_to_import_neuralforecast,
try_to_import_sklearn,
<<<<<<< HEAD
<<<<<<< HEAD
try_to_import_statsforecast,
=======
=======
>>>>>>> ca239aea (Add support for Neuralforecast (#1115))
=======
try_to_import_ludwig,
>>>>>>> 2dacff69 (feat: sync master staging (#1050))
Expand All @@ -94,7 +97,14 @@
try_to_import_sklearn,
try_to_import_statsforecast,
>>>>>>> 40a10ce1 (Bump v0.3.4+ dev)
<<<<<<< HEAD
>>>>>>> 6d6a14c8 (Bump v0.3.4+ dev)
=======
>>>>>>> eva-master
=======
try_to_import_statsforecast,
>>>>>>> e8a181c5 (Add support for Neuralforecast (#1115))
>>>>>>> ca239aea (Add support for Neuralforecast (#1115))
try_to_import_torch,
try_to_import_ultralytics,
try_to_import_xgboost,
Expand Down Expand Up @@ -409,6 +419,7 @@ def handle_ultralytics_function(self):
def handle_forecasting_function(self):
"""Handle forecasting functions"""
os.environ["CUDA_VISIBLE_DEVICES"] = ""
<<<<<<< HEAD
=======
def handle_forecasting_function(self):
"""Handle forecasting functions"""
Expand All @@ -417,7 +428,13 @@ def handle_forecasting_function(self):
"""Handle forecasting functions"""
os.environ["CUDA_VISIBLE_DEVICES"] = ""
>>>>>>> 40a10ce1 (Bump v0.3.4+ dev)
<<<<<<< HEAD
>>>>>>> 6d6a14c8 (Bump v0.3.4+ dev)
=======
>>>>>>> eva-master
=======
>>>>>>> e8a181c5 (Add support for Neuralforecast (#1115))
>>>>>>> ca239aea (Add support for Neuralforecast (#1115))
aggregated_batch_list = []
child = self.children[0]
for batch in child.exec():
Expand Down Expand Up @@ -627,7 +644,11 @@ def handle_forecasting_function(self):
if int(x.split("horizon")[1].split(".pkl")[0]) >= horizon
]
if len(existing_model_files) == 0:
<<<<<<< HEAD
logger.info("Training, please wait...")
=======
print("Training, please wait...")
>>>>>>> ca239aea (Add support for Neuralforecast (#1115))
if library == "neuralforecast":
model.fit(df=data, val_size=horizon)
else:
Expand Down
28 changes: 28 additions & 0 deletions evadb/functions/forecast.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,9 @@ def setup(
time_column_rename: str,
id_column_rename: str,
<<<<<<< HEAD
<<<<<<< HEAD
=======
>>>>>>> e8a181c5 (Add support for Neuralforecast (#1115))
horizon: int,
library: str,
=======
Expand Down Expand Up @@ -129,6 +132,9 @@ def setup(self, model_name: str, model_path: str):
self.horizon = int(horizon)
self.library = library
<<<<<<< HEAD
<<<<<<< HEAD
=======
>>>>>>> ca239aea (Add support for Neuralforecast (#1115))

=======

Expand All @@ -146,12 +152,15 @@ def setup(self, model_name: str, model_path: str):
)
>>>>>>> 2dacff69 (feat: sync master staging (#1050))
def forward(self, data) -> pd.DataFrame:
<<<<<<< HEAD
<<<<<<< HEAD
if self.library == "statsforecast":
forecast_df = self.model.predict(h=self.horizon)
else:
forecast_df = self.model.predict()
=======
=======
>>>>>>> ca239aea (Add support for Neuralforecast (#1115))
horizon = list(data.iloc[:, -1])[0]
assert (
type(horizon) is int
Expand All @@ -169,25 +178,44 @@ def forward(self, data) -> pd.DataFrame:
self.library = library

def forward(self, data) -> pd.DataFrame:
=======
>>>>>>> e8a181c5 (Add support for Neuralforecast (#1115))
if self.library == "statsforecast":
forecast_df = self.model.predict(h=self.horizon)
else:
forecast_df = self.model.predict()
<<<<<<< HEAD
>>>>>>> 40a10ce1 (Bump v0.3.4+ dev)
>>>>>>> 6d6a14c8 (Bump v0.3.4+ dev)
=======
<<<<<<< HEAD
<<<<<<< HEAD
=======
>>>>>>> 40a10ce1 (Bump v0.3.4+ dev)
>>>>>>> eva-master
=======
>>>>>>> e8a181c5 (Add support for Neuralforecast (#1115))
>>>>>>> ca239aea (Add support for Neuralforecast (#1115))
forecast_df.reset_index(inplace=True)
forecast_df = forecast_df.rename(
columns={
"unique_id": self.id_column_rename,
"ds": self.time_column_rename,
self.model_name: self.predict_column_rename,
}
<<<<<<< HEAD
<<<<<<< HEAD
)[: self.horizon * forecast_df["unique_id"].nunique()]
=======
<<<<<<< HEAD
)
<<<<<<< HEAD
>>>>>>> 6d6a14c8 (Bump v0.3.4+ dev)
=======
=======
)[: self.horizon * forecast_df["unique_id"].nunique()]
>>>>>>> e8a181c5 (Add support for Neuralforecast (#1115))
>>>>>>> ca239aea (Add support for Neuralforecast (#1115))
return forecast_df
=======
<<<<<<< HEAD
Expand Down
3 changes: 3 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,7 @@ def read(path, encoding="utf-8"):
forecasting_libs = [
"statsforecast", # MODEL TRAIN AND FINE TUNING
"neuralforecast" # MODEL TRAIN AND FINE TUNING
<<<<<<< HEAD
]

imagegen_libs = [
Expand All @@ -158,6 +159,8 @@ def read(path, encoding="utf-8"):
forecasting_libs = [
"statsforecast", # MODEL TRAIN AND FINE TUNING
"neuralforecast" # MODEL TRAIN AND FINE TUNING
=======
>>>>>>> ca239aea (Add support for Neuralforecast (#1115))
]
=======
>>>>>>> 8c5b63dc (release: merge staging into master (#1032))
Expand Down
35 changes: 35 additions & 0 deletions test/integration_tests/long/test_model_forecasting.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,15 @@ def setUpClass(cls):
=======
>>>>>>> 40a10ce1 (Bump v0.3.4+ dev)
>>>>>>> 6d6a14c8 (Bump v0.3.4+ dev)
create_table_query = """
CREATE TABLE AirDataPanel (\
unique_id TEXT(30),\
ds TEXT(30),\
y INTEGER,\
trend INTEGER,\
ylagged INTEGER);"""
execute_query_fetch_all(cls.evadb, create_table_query)

create_table_query = """
CREATE TABLE HomeData (\
saledate TEXT(30),\
Expand Down Expand Up @@ -93,21 +102,38 @@ def setUpClass(cls):
<<<<<<< HEAD
<<<<<<< HEAD
<<<<<<< HEAD
<<<<<<< HEAD
=======
>>>>>>> 40a10ce1 (Bump v0.3.4+ dev)
=======
=======
>>>>>>> 6d6a14c8 (Bump v0.3.4+ dev)
=======
=======
<<<<<<< HEAD
<<<<<<< HEAD
=======
>>>>>>> eva-master
=======
>>>>>>> e8a181c5 (Add support for Neuralforecast (#1115))
>>>>>>> ca239aea (Add support for Neuralforecast (#1115))
path = f"{EvaDB_ROOT_DIR}/data/forecasting/AirPassengersPanel.csv"
load_query = f"LOAD CSV '{path}' INTO AirDataPanel;"
execute_query_fetch_all(cls.evadb, load_query)

<<<<<<< HEAD
<<<<<<< HEAD
=======
>>>>>>> 53dafecf (feat: sync master staging (#1050))
=======
>>>>>>> 40a10ce1 (Bump v0.3.4+ dev)
<<<<<<< HEAD
>>>>>>> 6d6a14c8 (Bump v0.3.4+ dev)
=======
>>>>>>> eva-master
=======
>>>>>>> e8a181c5 (Add support for Neuralforecast (#1115))
>>>>>>> ca239aea (Add support for Neuralforecast (#1115))
path = f"{EvaDB_ROOT_DIR}/data/forecasting/home_sales.csv"
load_query = f"LOAD CSV '{path}' INTO HomeData;"
execute_query_fetch_all(cls.evadb, load_query)
Expand Down Expand Up @@ -212,13 +238,22 @@ def test_forecast(self):
<<<<<<< HEAD
<<<<<<< HEAD
<<<<<<< HEAD
<<<<<<< HEAD
<<<<<<< HEAD
SELECT AirForecast() order by y;
=======
>>>>>>> ca239aea (Add support for Neuralforecast (#1115))
=======
=======
>>>>>>> 6d6a14c8 (Bump v0.3.4+ dev)
SELECT AirForecast(12) order by y;
<<<<<<< HEAD
>>>>>>> 53dafecf (feat: sync master staging (#1050))
=======
=======
SELECT AirForecast() order by y;
>>>>>>> e8a181c5 (Add support for Neuralforecast (#1115))
>>>>>>> ca239aea (Add support for Neuralforecast (#1115))
"""
result = execute_query_fetch_all(self.evadb, predict_query)
self.assertEqual(len(result), 12)
Expand Down
7 changes: 7 additions & 0 deletions test/unit_tests/binder/test_statement_binder.py
Original file line number Diff line number Diff line change
Expand Up @@ -688,6 +688,7 @@ def test_bind_create_function_should_bind_forecast_with_renaming_columns(self):
self.assertEqual(create_function_statement.inputs, expected_inputs)
self.assertEqual(create_function_statement.outputs, expected_outputs)

<<<<<<< HEAD
<<<<<<< HEAD
=======
<<<<<<< HEAD
Expand Down Expand Up @@ -738,7 +739,13 @@ def test_bind_create_function_should_raise_forecast_with_unexpected_columns(self

=======
>>>>>>> 40a10ce1 (Bump v0.3.4+ dev)
<<<<<<< HEAD
>>>>>>> 6d6a14c8 (Bump v0.3.4+ dev)
=======
>>>>>>> eva-master
=======
>>>>>>> e8a181c5 (Add support for Neuralforecast (#1115))
>>>>>>> ca239aea (Add support for Neuralforecast (#1115))
def test_bind_create_function_should_raise_forecast_missing_required_columns(self):
with patch.object(StatementBinder, "bind"):
create_function_statement = MagicMock()
Expand Down

0 comments on commit 8eeef95

Please sign in to comment.