You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm looking for the option to use a DataLoader with guaranteed non-shuffling when in a in a multi-thread Julia session.
I noticed that the parallel keyword argument has been deprecated, though I'm not sure if the option was enforced when on a multi-thread session.
The issue is that in a scenario where inference on a large data is performed, it is then desirable to guarantee the order of the iterations so that the predictions are in the same order as the original data. However, this apparently cannot be achieved if num_threads > 1.
MWE:
using DataLoaders
import LearnBase: nobs, getobs
struct MyContainer{S <:AbstractArray}
x::S
length::Intendnobs(data::MyContainer) =ceil(Int, size(data.x, 1) / data.length)
functiongetobs(data::MyContainer, idx::Int)
println("get obs MyContainer - idx: ", idx)
x =if idx <nobs(data)
data.x[((idx -1) * data.length +1):(idx * data.length), :]
else
data.x[((idx -1) * data.length +1):end, :]
endreturn x
end
x =rand(10,2)
data =MyContainer(x, 4)
dloader = DataLoaders.DataLoader(data, nothing)
Then, randomness can be observed in the batch order:
julia>for x in dloader
println("size(x): ", size(x))
end
get obs MyContainer - idx:3
get obs MyContainer - idx:2size(x): (2, 2)
get obs MyContainer - idx:1size(x): (4, 2)
size(x): (4, 2)
julia>for x in dloader
println("size(x): ", size(x))
end
get obs MyContainer - idx:1
get obs MyContainer - idx:3
get obs MyContainer - idx:2size(x): (4, 2)
size(x): (2, 2)
size(x): (4, 2)
Is it possible to enforce the returned idx to always be 1,2,3? Having the option to disable the multi-threaded fetch would do it. Not sure if it would be feasible to let the multi-processing in place but wait to return the result after the previous id has been completed?
The text was updated successfully, but these errors were encountered:
I'm looking for the option to use a DataLoader with guaranteed non-shuffling when in a in a multi-thread Julia session.
I noticed that the
parallel
keyword argument has been deprecated, though I'm not sure if the option was enforced when on a multi-thread session.The issue is that in a scenario where inference on a large data is performed, it is then desirable to guarantee the order of the iterations so that the predictions are in the same order as the original data. However, this apparently cannot be achieved if num_threads > 1.
MWE:
Then, randomness can be observed in the batch order:
Is it possible to enforce the returned idx to always be 1,2,3? Having the option to disable the multi-threaded fetch would do it. Not sure if it would be feasible to let the multi-processing in place but wait to return the result after the previous id has been completed?
The text was updated successfully, but these errors were encountered: