`TaskLoader` makes copies of data, leading to duplication in memory #82

tom-andersson · 2023-10-18T11:28:26Z

In many deepsensor modelling scenarios, the user will have the same dataset (xarray or pandas) on the context and target side of the TaskLoader. Clearly, the TaskLoader should be using the same object in memory in these cases. However, part of the processing in the TaskLoader is returning a copy of the data objects. Since different pointers are used for the context and target data, this results in duplication in memory. See code example below.

import deepsensor.torch
from deepsensor.data import DataProcessor, TaskLoader
from deepsensor.model import ConvNP
from deepsensor.train import Trainer

import xarray as xr
import pandas as pd
import numpy as np
from tqdm import tqdm

# Load raw data
ds_raw = xr.tutorial.open_dataset("air_temperature")

# Normalise data
data_processor = DataProcessor(x1_name="lat", x2_name="lon")
ds = data_processor(ds_raw)

task_loader = TaskLoader(context=ds, target=ds)

>>> print(task_loader.context[0] is task_loader.target[0])
False

One solution is to use a hashmap/dict which is shared between the context and target data. Some thought would be needed on what the keys should be in the hashmap, and how the context and target lists should link to those entries.

We will need to test this for both xarray/pandas cases and also the case where the context/target entries are fpaths rather than xarray/pandas objects.

The text was updated successfully, but these errors were encountered:

tom-andersson mentioned this issue Oct 19, 2023

Implement TaskLoader.save when instantiated with xarray/pandas objects #84

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`TaskLoader` makes copies of data, leading to duplication in memory #82

`TaskLoader` makes copies of data, leading to duplication in memory #82

tom-andersson commented Oct 18, 2023

TaskLoader makes copies of data, leading to duplication in memory #82

TaskLoader makes copies of data, leading to duplication in memory #82

Comments

tom-andersson commented Oct 18, 2023

`TaskLoader` makes copies of data, leading to duplication in memory #82

`TaskLoader` makes copies of data, leading to duplication in memory #82