You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In many deepsensor modelling scenarios, the user will have the same dataset (xarray or pandas) on the context and target side of the TaskLoader. Clearly, the TaskLoader should be using the same object in memory in these cases. However, part of the processing in the TaskLoader is returning a copy of the data objects. Since different pointers are used for the context and target data, this results in duplication in memory. See code example below.
import deepsensor.torch
from deepsensor.data import DataProcessor, TaskLoader
from deepsensor.model import ConvNP
from deepsensor.train import Trainer
import xarray as xr
import pandas as pd
import numpy as np
from tqdm import tqdm
# Load raw data
ds_raw = xr.tutorial.open_dataset("air_temperature")
# Normalise data
data_processor = DataProcessor(x1_name="lat", x2_name="lon")
ds = data_processor(ds_raw)
task_loader = TaskLoader(context=ds, target=ds)
>>> print(task_loader.context[0] is task_loader.target[0])
False
One solution is to use a hashmap/dict which is shared between the context and target data. Some thought would be needed on what the keys should be in the hashmap, and how the context and target lists should link to those entries.
We will need to test this for both xarray/pandas cases and also the case where the context/target entries are fpaths rather than xarray/pandas objects.
The text was updated successfully, but these errors were encountered:
In many
deepsensor
modelling scenarios, the user will have the same dataset (xarray or pandas) on the context and target side of theTaskLoader
. Clearly, theTaskLoader
should be using the same object in memory in these cases. However, part of the processing in theTaskLoader
is returning a copy of the data objects. Since different pointers are used for thecontext
andtarget
data, this results in duplication in memory. See code example below.One solution is to use a hashmap/dict which is shared between the
context
andtarget
data. Some thought would be needed on what the keys should be in the hashmap, and how thecontext
andtarget
lists should link to those entries.We will need to test this for both xarray/pandas cases and also the case where the
context
/target
entries are fpaths rather than xarray/pandas objects.The text was updated successfully, but these errors were encountered: