You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Apologies for bringing what might seem like an exhaustive list of small inconveniences, but for the last week and a half I have been trying to get Forecasting's TFT to work seamlessly on a very large amount of data that needs to be pulled from a database; unfortunately I keep running into issues that are quite cryptic in their description.
PyTorch-Forecasting version: 0.10.3
PyTorch version: 1.13.1
Python version: 3.10.9
Operating System: Arch Linux 6.1.12
KeyError when predicting on new data
Expected behavior
After training the TFT I tried to create my prediction set using the TimeSeriesDataSet.from_dataset() method, giving in arguments as explained in the tutorial: my existing training TimeSeriesDataSet, the Pandas DataFrame I want to create my prediction set from, and predict = True. Obviously my prediction dataframe doesn't include the target variable.
The expected behaviour is a TimeSeriesDataSet constructed on the prediction DataFrame in a fashion very similar to how the tutorial explains we create a validation dataset.
Actual behavior
However, the result was a KeyError highlighting my missing target feature. Quickly speculating, I tried appending a column named what the target is onto the prediction DataFrame filled with zeroes before attempting to recreate a TimeSeriesDataSet from it, but unfortunately that ran me into another issue: torch.finfo() requires a floating point input type. Use torch.iinfo to handle 'torch.finfo'.
Code to reproduce the problem
This is my utility function that creates and returns a train and validation TimeSeriesDataSet. I use this when training the TFT, and I keep the returned train_tsd around for recreating a prediction set later on once training completes:
And this is how I'm trying to recreate my prediction TimeSeriesDataSet, where pred_df has exactly the same features as the training dataset I'm passing into the function above (get_tsdsets()) except for my target column, "yield":
Without "yield" being present, I get a KeyError informing me that my target variable is not present. If I try to add one by doing pred_df["yield"] = 0.0, I a TypeError: TypeError: torch.finfo() requires a floating point input type. Use torch.iinfo to handle 'torch.finfo'. Given the problem here I don't think creating a separate TimeSeriesDataSet purely for the prediction data will work because although I might get a dataset and loader, the TFT model's .predict() method itself will probably complain about the target missing. The crux of this issue of mine is thus:
How exactly do I go about training the TFT and then predicting on unseen test data?
NumPy "Float" Type
This is a minor issue but I just wanted to mention my workaround for this. It's already well known (pull 1257) that Forecasting is affected by NumPy deprecating their np.float data type in favour of the built-in float; the suggestion by yairmassury (issue 1236) is what worked exactly. Following the discussion at pull 1257, I didn't change anything in any other file.
AttributeError from base_model.py
This issue echoes what was discussed in issue 1255. I was able to work around this by going into pytorch_forecasting/models/base_model.py and making a small change at line 260; adding in an underscore before init_args so that both items returned from Lightning's get_init_args() function are received, and the one so desired can be iterated over. The line now looks like _, init_args = get_init_args(frame). Should I make a pull request for this?
I have also uploaded a Google Colab notebook that should help in understanding where I'm going wrong. Any assistance on this issue would be greatly appreciated because it's quite frustrating to have a trained TFT ready to go but be unable to make predictions on unseen data with it. I'm sure it's something simple I just can't quite put my finger on.
The text was updated successfully, but these errors were encountered:
It seems the issue as it stands, given my current description of the problem, was my expectation that casting Pandas data types using .astype() was an in-place operation; .astype() actually returns the dataframe object and doesn't cast in-place. Because of this, even after casting my appended target feature to float, it never registered and produced the error.
Since the crux of this issue has been solved I'll be closing the issue, however I'm still a little curious to understand why is it that the prediction set requires the target column to exist - I'm certain there has to be a better way to initialise and use the prediction set. My current process for predicting with Forecasting's TFT is thus:
train the model, save the best one
load the best model for predicting
pull data from the database into a Pandas DataFrame
append a dummy target feature onto the DataFrame from step (3) and fill it with 0.0 (be sure to check that the target feature's data type is float)
slice the prediction DataFrame into the relevant training sizes (in my case I trained on 9 months of data and predicted for 3, so I'll slice my prediction set into 4 partitions - note that this operation can also be done while pulling data directly from the database rather than in-memory)
predict!
Edit
Totally forgot to mention: I've updated the link to the Colab notebook to better reflect the solution and the steps I took to predict with the Temporal Fusion Transformer.
Greetings all,
Apologies for bringing what might seem like an exhaustive list of small inconveniences, but for the last week and a half I have been trying to get Forecasting's TFT to work seamlessly on a very large amount of data that needs to be pulled from a database; unfortunately I keep running into issues that are quite cryptic in their description.
KeyError when predicting on new data
Expected behavior
After training the TFT I tried to create my prediction set using the
TimeSeriesDataSet.from_dataset()
method, giving in arguments as explained in the tutorial: my existing trainingTimeSeriesDataSet
, the PandasDataFrame
I want to create my prediction set from, andpredict = True
. Obviously my prediction dataframe doesn't include the target variable.The expected behaviour is a
TimeSeriesDataSet
constructed on the predictionDataFrame
in a fashion very similar to how the tutorial explains we create a validation dataset.Actual behavior
However, the result was a KeyError highlighting my missing target feature. Quickly speculating, I tried appending a column named what the target is onto the prediction
DataFrame
filled with zeroes before attempting to recreate aTimeSeriesDataSet
from it, but unfortunately that ran me into another issue:torch.finfo() requires a floating point input type. Use torch.iinfo to handle 'torch.finfo'
.Code to reproduce the problem
This is my utility function that creates and returns a train and validation
TimeSeriesDataSet
. I use this when training the TFT, and I keep the returnedtrain_tsd
around for recreating a prediction set later on once training completes:And this is how I'm trying to recreate my prediction
TimeSeriesDataSet
, wherepred_df
has exactly the same features as the training dataset I'm passing into the function above (get_tsdsets()
) except for my target column, "yield":Without "yield" being present, I get a KeyError informing me that my target variable is not present. If I try to add one by doing
pred_df["yield"] = 0.0
, I a TypeError:TypeError: torch.finfo() requires a floating point input type. Use torch.iinfo to handle 'torch.finfo'
. Given the problem here I don't think creating a separateTimeSeriesDataSet
purely for the prediction data will work because although I might get a dataset and loader, the TFT model's.predict()
method itself will probably complain about the target missing. The crux of this issue of mine is thus:How exactly do I go about training the TFT and then predicting on unseen test data?
NumPy "Float" Type
This is a minor issue but I just wanted to mention my workaround for this. It's
already well known (pull 1257) that Forecasting is affected by NumPy deprecating their
np.float
data type in favour of the built-infloat
; the suggestion by yairmassury (issue 1236) is what worked exactly. Following the discussion at pull 1257, I didn't change anything in any other file.AttributeError from
base_model.py
This issue echoes what was discussed in issue 1255. I was able to work around this by going into
pytorch_forecasting/models/base_model.py
and making a small change at line 260; adding in an underscore beforeinit_args
so that both items returned from Lightning'sget_init_args()
function are received, and the one so desired can be iterated over. The line now looks like_, init_args = get_init_args(frame)
. Should I make a pull request for this?I have also uploaded a Google Colab notebook that should help in understanding where I'm going wrong. Any assistance on this issue would be greatly appreciated because it's quite frustrating to have a trained TFT ready to go but be unable to make predictions on unseen data with it. I'm sure it's something simple I just can't quite put my finger on.
The text was updated successfully, but these errors were encountered: