[ENH] enable large data use cases - decouple data input from pandas
, allow polars
, dask
, and/or spark
#1685
Labels
enhancement
New feature or request
A key limitation of current architecture seems to be the reliance on
pandas
of the input, which limites useability in large data cases.While
torch
with appropriate backends should be able to handle large data,pandas
as a container choice, in particular the current instantiation which seems to rely on in-memory, will prove to be the bottleneck.We should therefore consider and implement support for data backends that scale better, such as
polars
,dask
, orspark
, and see how easy it is to get thepandas
pyarrow
integration to work.Architecturally, I think we should:
pandas
one of multiple potential data soft dependenciesThe key entry point for this extension or refactor is
TimeSeriesDataSet
, which requirespandas
objects to be passed.The text was updated successfully, but these errors were encountered: