Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accounting for data objects that only iterate #15

Closed
ablaom opened this issue Jan 30, 2023 · 1 comment
Closed

Accounting for data objects that only iterate #15

ablaom opened this issue Jan 30, 2023 · 1 comment

Comments

@ablaom
Copy link
Member

ablaom commented Jan 30, 2023

In the old MLLearn terminology, we have data containers (observations can be randomly accessed) and mere iterators. Dataloaders (as currently implemented in DataUtils.jl) for example, are only iterators and some models will want to support them and regular data containers. In the latter case, a higher level interface will want to control observation resampling (CV) but in the former case, we're happy to forgo that functionality. How does the current "data interface" adapt to this complication? That is, how does the implementation articulate the fact that some allowed data objects cannot be subsampled? (At present the implicit assumption is that all data objects are data containers.)

(Originally there had been some discussion that Dataloaders would support (slow) random access, but that idea appears to have been abandoned in the DataLoaders -> MLUtils refactoring. Perhaps @lorenzoh would care to comment.)

One idea is for a model accepting an iterator Xiter to set getobs(model, I, Xiter) = nothing, and to define getobs(model, I, X) as normal for a data container X. Is it safe to say we will generally be able to distinguish the iterable from the containers based on type alone, and avoid possible type instabilities here?

Any other ideas?

@ablaom
Copy link
Member Author

ablaom commented Nov 2, 2024

On dev a learner can now specify the data access their data supports (more precisely, what the output of obs(learner, data) must support). The options are described here.

@ablaom ablaom closed this as completed Nov 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant