Skip to content

Commit

Permalink
Merge pull request #59 from legend-exp/dev
Browse files Browse the repository at this point in the history
Simplified data loader
  • Loading branch information
fhagemann authored Sep 24, 2024
2 parents d9483fa + 8be0c18 commit 13775c4
Show file tree
Hide file tree
Showing 21 changed files with 1,321 additions and 485 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,6 @@
*.jl.cov
*.jl.*.cov
*.jl.mem
*.lh5
.ipynb_checkpoints
Manifest.toml
7 changes: 3 additions & 4 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -31,12 +31,13 @@ Unitful = "1986cc42-f94f-5a68-af5c-568840ba703d"

[weakdeps]
LegendHDF5IO = "c9265ca6-b027-5446-b1a4-febfa8dd10b0"
LegendDataTypes = "99e09c13-5545-5ee2-bfa2-77f358fb75d8"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
RecipesBase = "3cdcf5f2-1ef4-517c-9805-6587b60abb01"
SolidStateDetectors = "71e43887-2bd9-5f77-aebd-47f656f0a3f0"

[extensions]
LegendDataManagementLegendHDF5IOExt = "LegendHDF5IO"
LegendDataManagementLegendHDF5IOExt = ["LegendHDF5IO", "LegendDataTypes"]
LegendDataManagementPlotsExt = ["RecipesBase", "Plots"]
LegendDataManagementSolidStateDetectorsExt = "SolidStateDetectors"

Expand All @@ -47,6 +48,7 @@ Format = "1"
Glob = "1"
IntervalSets = "0.6, 0.7"
JSON = "0.21, 1"
LegendDataTypes = "0.1.13"
LRUCache = "1.5"
LegendHDF5IO = "0.1.14"
LinearAlgebra = "1"
Expand All @@ -70,6 +72,3 @@ TypedTables = "1.4"
UUIDs = "1"
Unitful = "1"
julia = "1.10"

[extras]
LegendHDF5IO = "c9265ca6-b027-5446-b1a4-febfa8dd10b0"
51 changes: 50 additions & 1 deletion docs/src/extensions.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,42 @@
# Extensions

## `LegendHDF5IO` extension

LegendDataManagment provides an extension for [LegendHDF5IO](https://github.com/legend-exp/LegendHDF5IO.jl).
This makes it possible to directly load LEGEND data from HDF5 files via the `read_ldata` function. The extension is automatically loaded when both packages are loaded.
Example (requires a `$LEGEND_DATA_CONFIG` environment variable pointing to a legend data-config file):

```julia
using LegendDataManagement, LegendHDF5IO
l200 = LegendData(:l200)
filekeys = search_disk(FileKey, l200.tier[:jldsp, :cal, :p03, :r000])

chinfo = channelinfo(l200, (:p03, :r000, :cal); system=:geds, only_processable=true)

ch = chinfo[1].channel

dsp = read_ldata(l200, :jldsp, first(filekeys), ch)
dsp = read_ldata(l200, :jldsp, :cal, :p03, :r000, ch)
dsp = read_ldata((:e_cusp, :e_trap, :blmean, :blslope), l200, :jldsp, :cal, :p03, :r000, ch)
```
`read_ldata` automitcally loads LEGEND data for a specific `DataTier` and data selection like e.g. a `FileKey` or a run-selection based for a given `ChannelId`. The `search_disk` function allows the user to search for available `DataTier` and `FileKey` on disk. The first argument can be either a selection of keys in form of a `NTuple` of `Symbol` or a [PropertyFunction](https://github.com/oschulz/PropertyFunctions.jl/tree/main) which will be applied during loading.
It is also possible to load whole a `DataPartition` or `DataPeriod` for a given `ChannelId` ch:
```julia
dsp = read_ldata(l200, :jldsp, :cal, DataPartition(1), ch)
dsp = read_ldata(l200, :jldsp, :cal, DataPeriod(3), ch)
```
In additon, it is possible to load a random selection of `n_evts` events randomly selected from each loaded file:
```julia
dsp = read_ldata(l200, :jldsp, :cal, :p03, :r000, ch; n_evts=1000)
```
For simplicity, the ch can also be given as a `DetectorID` which will be converted internally to a `ChannelId`:
```julia
det = chinfo[1].detector
dsp = read_ldata(l200, :jldsp, :cal, :p03, :r000, det)
```
## `SolidStateDetectors` extension

LegendDataManagment provides an extension for [SolidStateDetectors](https://github.com/JuliaPhysics/SolidStateDetectors.jl). This makes it possible to create `SolidStateDetector` instances from LEGEND metadata.
LegendDataManagment provides an extension for [SolidStateDetectors](https://github.com/JuliaPhysics/SolidStateDetectors.jl). This makes it possible to create `SolidStateDetector` and `Simulation` instances from LEGEND metadata.

Example (requires a `$LEGEND_DATA_CONFIG` environment variable pointing to a legend data-config file):

Expand All @@ -18,6 +52,21 @@ A detector can also be constructed using the filename of the LEGEND metadata det
det = SolidStateDetector(LegendData, "V99000A.json")
```

In addition, when creating a `Simulation`, all simulation functions in SolidStateDetectors.jl can be applied. As usual, all fields stored in the `Simulation` can be written and read using `LegendHDF5IO`:

```julia
using LegendDataManagement
using SolidStateDetectors

sim = Simulation(LegendData, "V99000A.json")
simulate!(sim) # calculate electric field and weighting potentials

using LegendHDF5IO
ssd_write("V99000A.lh5", sim)
sim_in = ssd_read("V99000A.lh5", Simulation)
```


The following code will generate an overview plot of every 5th LEGEND detector (requires the actual LEGEND metadata instead of the metadata in legend-testdata):

```julia
Expand Down
185 changes: 182 additions & 3 deletions ext/LegendDataManagementLegendHDF5IOExt.jl
Original file line number Diff line number Diff line change
@@ -1,9 +1,26 @@
module LegendDataManagementLegendHDF5IOExt

using LegendDataManagement
LegendDataManagement._lh5_ext_loaded(::Val{true}) = true
using LegendDataManagement.LDMUtils: detector2channel
using LegendDataManagement: RunCategorySelLike
using LegendHDF5IO
using LegendDataTypes: fast_flatten, flatten_by_key
using TypedTables, PropertyFunctions

# using LegendDataManagement: LegendDataManagement.DataSelector
const ChannelOrDetectorIdLike = Union{ChannelIdLike, DetectorIdLike}
const AbstractDataSelectorLike = Union{AbstractString, Symbol, DataTierLike, DataCategoryLike, DataPeriodLike, DataRunLike, DataPartitionLike, ChannelOrDetectorIdLike}
const PossibleDataSelectors = [DataTier, DataCategory, DataPeriod, DataRun, DataPartition, ChannelId, DetectorId]

function _get_channelid(data::LegendData, rsel::Union{AnyValiditySelection, RunCategorySelLike}, det::ChannelOrDetectorIdLike)
if LegendDataManagement._can_convert_to(ChannelId, det)
ChannelId(det)
elseif LegendDataManagement._can_convert_to(DetectorId, det)
detector2channel(data, rsel, det)
else
throw(ArgumentError("$det is neither a ChannelId nor a DetectorId"))
end
end

const dataselector_bytypes = Dict{Type, String}()

Expand Down Expand Up @@ -43,7 +60,7 @@ end

function LegendHDF5IO.LH5Array(ds::LegendHDF5IO.HDF5.Dataset,
::Type{<:AbstractArray{<:T, N}}) where {T <: LegendDataManagement.DataSelector, N}

s = read(ds)
T.(s)
end
Expand All @@ -68,4 +85,166 @@ function __init__()
(@isdefined DataPartition) && extend_datatype_dict(DataPartition, "datapartition")
end

end
function _lh5_data_open(f::Function, data::LegendData, tier::DataTierLike, filekey::FileKey, ch::ChannelIdLike, mode::AbstractString="r")
ch_filename = data.tier[DataTier(tier), filekey, ch]
filename = data.tier[DataTier(tier), filekey]
if isfile(ch_filename)
@debug "Read from $(basename(ch_filename))"
LegendHDF5IO.lh5open(f, ch_filename, mode)
elseif isfile(filename)
@debug "Read from $(basename(filename))"
LegendHDF5IO.lh5open(f, filename, mode)
else
throw(ArgumentError("Neither $(basename(filename)) nor $(basename(ch_filename)) found"))
end
end

_propfunc_columnnames(f::PropSelFunction{cols}) where cols = cols

_load_all_keys(nt::NamedTuple, n_evts::Int=-1) = if length(nt) == 1 _load_all_keys(nt[first(keys(nt))], n_evts) else NamedTuple{keys(nt)}(map(x -> _load_all_keys(nt[x], n_evts), keys(nt))) end
_load_all_keys(arr::AbstractArray, n_evts::Int=-1) = arr[:][if (n_evts < 1 || n_evts > length(arr)) 1:length(arr) else rand(1:length(arr), n_evts) end]
_load_all_keys(t::Table, n_evts::Int=-1) = t[:][if (n_evts < 1 || n_evts > length(t)) 1:length(t) else rand(1:length(t), n_evts) end]
_load_all_keys(x, n_evts::Int=-1) = x

function LegendDataManagement.read_ldata(f::Base.Callable, data::LegendData, rsel::Tuple{DataTierLike, FileKey, ChannelOrDetectorIdLike}; n_evts::Int=-1)
tier, filekey, ch = DataTier(rsel[1]), rsel[2], if !isempty(string((rsel[3]))) _get_channelid(data, rsel[2], rsel[3]) else rsel[3] end
_lh5_data_open(data, tier, filekey, ch) do h
if !isempty(string((ch))) && !haskey(h, "$ch")
throw(ArgumentError("Channel $ch not found in $(basename(string(h.data_store)))"))
end
if f == identity
if !isempty(string((ch)))
_load_all_keys(h[ch, tier], n_evts)
else
_load_all_keys(h[tier], n_evts)
end
elseif f isa PropSelFunction
if !isempty(string((ch)))
_load_all_keys(getproperties(_propfunc_columnnames(f)...)(h[ch, tier]), n_evts)
else
_load_all_keys(getproperties(_propfunc_columnnames(f)...)(h[tier]), n_evts)
end
else
result = if !isempty(string((ch)))
f.(_load_all_keys(h[ch, tier], n_evts))
else
f.(_load_all_keys(h[tier], n_evts))
end
if result isa AbstractVector{<:NamedTuple}
Table(result)
else
result
end
end
end
end

function LegendDataManagement.read_ldata(f::Base.Callable, data::LegendData, rsel::Tuple{DataTierLike, FileKey}; kwargs...)
ch_keys = _lh5_data_open(data, rsel[1], rsel[2], "") do h
keys(h)
end
@debug "Found keys: $ch_keys"
if length(ch_keys) == 1
if string(only(ch_keys)) == string(rsel[1])
LegendDataManagement.read_ldata(f, data, (rsel[1], rsel[2], ""); kwargs...)
elseif LegendDataManagement._can_convert_to(ChannelId, only(ch_keys)) || LegendDataManagement._can_convert_to(DetectorId, only(ch_keys))
LegendDataManagement.read_ldata(f, data, (rsel[1], rsel[2], string(only(ch_keys))); kwargs...)
else
throw(ArgumentError("No tierm channel or detector found in $(basename(string(h.data_store)))"))
end
else
NamedTuple{Tuple(Symbol.(ch_keys))}([LegendDataManagement.read_ldata(f, data, (rsel[1], rsel[2], ch); kwargs...) for ch in ch_keys]...)
end
end

lflatten(x) = fast_flatten(x)
# lflatten(t::AbstractVector{<:Table}) = append!(t...)
lflatten(nt::AbstractVector{<:NamedTuple}) = flatten_by_key(nt)

function LegendDataManagement.read_ldata(f::Base.Callable, data::LegendData, rsel::Tuple{DataTierLike, AbstractVector{FileKey}, ChannelOrDetectorIdLike}; kwargs...)
if !isempty(string(rsel[3]))
lflatten([LegendDataManagement.read_ldata(f, data, (rsel[1], fk, rsel[3]); kwargs...) for fk in rsel[2]])
else
lflatten([LegendDataManagement.read_ldata(f, data, (rsel[1], fk); kwargs...) for fk in rsel[2]])
end
end
LegendDataManagement.read_ldata(f::Base.Callable, data::LegendData, rsel::Tuple{DataTierLike, AbstractVector{FileKey}}; kwargs...) =
LegendDataManagement.read_ldata(f, data, (rsel[1], rsel[2], ""); kwargs...)

### Argument distinction for different DataSelector Types
function _convert_rsel2dsel(rsel::NTuple{<:Any, AbstractDataSelectorLike})
selector_types = [PossibleDataSelectors[LegendDataManagement._can_convert_to.(PossibleDataSelectors, Ref(s))] for s in rsel]
if length(selector_types[2]) > 1 && DataCategory in selector_types[2]
selector_types[2] = [DataCategory]
end
if isempty(last(selector_types))
selector_types[end] = [String]
end
if !all(length.(selector_types) .<= 1)
throw(ArgumentError("Ambiguous selector types: $selector_types for $rsel"))
end
Tuple([only(st)(r) for (r, st) in zip(rsel, selector_types)])
end

function LegendDataManagement.read_ldata(f::Base.Callable, data::LegendData, rsel::NTuple{<:Any, AbstractDataSelectorLike}; kwargs...)
LegendDataManagement.read_ldata(f, data, _convert_rsel2dsel(rsel); kwargs...)
end

LegendDataManagement.read_ldata(f::Base.Callable, data::LegendData, rsel::Tuple{DataTier, DataCategory, DataPeriod}; kwargs...) =
LegendDataManagement.read_ldata(f, data, (DataTier(rsel[1]), DataCategory(rsel[2]), DataPeriod(rsel[3]), ""))

LegendDataManagement.read_ldata(f::Base.Callable, data::LegendData, rsel::Tuple{DataTier, DataCategory, DataPeriod, DataRun}; kwargs...) =
LegendDataManagement.read_ldata(f, data, (rsel[1], rsel[2], rsel[3], rsel[4], ""))


function LegendDataManagement.read_ldata(f::Base.Callable, data::LegendData, rsel::Tuple{DataTier, DataCategory, DataPartition, ChannelOrDetectorIdLike}; kwargs...)
first_run = first(LegendDataManagement._get_partitions(data, :default)[rsel[3]])
ch = _get_channelid(data, (first_run.period, first_run.run, rsel[2]), rsel[4])
pinfo = partitioninfo(data, ch, rsel[3])
@assert ch == _get_channelid(data, (first(pinfo).period, first(pinfo).run, rsel[2]), rsel[4]) "Channel mismatch in partitioninfo"
LegendDataManagement.read_ldata(f, data, (rsel[1], rsel[2], pinfo, ch); kwargs...)
end

function LegendDataManagement.read_ldata(f::Base.Callable, data::LegendData, rsel::Tuple{DataTier, DataCategory, DataPeriod, ChannelOrDetectorIdLike}; kwargs...)
rinfo = runinfo(data, rsel[3])
first_run = first(rinfo)
ch = if !isempty(string(rsel[4]))
_get_channelid(data, (first_run.period, first_run.run, rsel[2]), rsel[4])
else
string(rsel[4])
end
LegendDataManagement.read_ldata(f, data, (rsel[1], rsel[2], rinfo, ch); kwargs...)
end

function LegendDataManagement.read_ldata(f::Base.Callable, data::LegendData, rsel::Tuple{DataTier, DataCategory, DataPeriod, DataRun, ChannelOrDetectorIdLike}; kwargs...)
fks = search_disk(FileKey, data.tier[rsel[1], rsel[2], rsel[3], rsel[4]])
ch = rsel[5]
if isempty(fks) && isfile(data.tier[rsel[1:4]..., ch])
LegendDataManagement.read_ldata(f, data, (rsel[1], start_filekey(data, (rsel[3], rsel[4], rsel[2])), ch); kwargs...)
elseif !isempty(fks)
LegendDataManagement.read_ldata(f, data, (rsel[1], fks, ch); kwargs...)
else
throw(ArgumentError("No filekeys found for $(rsel[2]) $(rsel[3]) $(rsel[4])"))
end
end


### DataPartition
const _partinfo_required_cols = NamedTuple{(:period, :run), Tuple{DataPeriod, DataRun}}

LegendDataManagement.read_ldata(f::Base.Callable, data::LegendData, rsel::Tuple{DataTierLike, DataCategoryLike, Table{_partinfo_required_cols}, ChannelOrDetectorIdLike}; kwargs...) =
lflatten([LegendDataManagement.read_ldata(f, data, (rsel[1], rsel[2], r.period, r.run, rsel[4]); kwargs...) for r in rsel[3]])

LegendDataManagement.read_ldata(f::Base.Callable, data::LegendData, rsel::Tuple{DataTierLike, DataCategoryLike, Table{_partinfo_required_cols}}; kwargs...) =
LegendDataManagement.read_ldata(f, data, (rsel[1], rsel[2], rsel[3], ""); kwargs...)

function LegendDataManagement.read_ldata(f::Base.Callable, data::LegendData, rsel::Tuple{DataTierLike, DataCategoryLike, Table, ChannelOrDetectorIdLike}; kwargs...)
@assert (hasproperty(rsel[3], :period) && hasproperty(rsel[3], :run)) "Runtable doesn't provide periods and runs"
LegendDataManagement.read_ldata(f, data, (rsel[1], rsel[2], Table(period = rsel[3].period, run = rsel[3].run), rsel[4]); kwargs...)
end

LegendDataManagement.read_ldata(f::Base.Callable, data::LegendData, rsel::Tuple{DataTierLike, DataCategoryLike, Table}; kwargs...) =
LegendDataManagement.read_ldata(f, data, (rsel[1], rsel[2], rsel[3], ""); kwargs...)


end # module
Loading

0 comments on commit 13775c4

Please sign in to comment.