Add option to directly use sensitive detector names to name output tables #153

gipert · 2024-11-10T07:59:11Z

This would simplify detector identification in post-processing (when e.g. querying metadata). An exception should be thrown if the names are not all unique and if the user requests to have all data in one big table.

ManuelHu · 2024-11-10T18:50:58Z

For the table mode per detector type we would still need to use the uids, but we could add a mapping table

gipert · 2024-11-11T09:04:15Z

For the table mode per detector type we would still need to use the uids

I don't see why other than having to change the code a bit...

ManuelHu · 2024-11-11T09:41:43Z

To be more clear what I mean: the code of an ntuple "per detector type" contains a column with the uids (now). If we would replace them with the names, we would have a string-typed column; that might be bad for storage size and performance...

gipert · 2024-11-11T09:43:01Z

Oh I understand. Mabye we can just get rid of it?

ManuelHu · 2024-11-11T09:52:51Z

This "flat" output mode is exactly what moritz requested in #85 , so probably not.

gipert · 2024-11-11T09:59:58Z

Sorry, I misunderstood again. In the multi-table output mode we don't store a column with the UID, so it would not use more disk space. We could change the code to generate a random UID to be used internally and then name the table in the output after the sensitive volume name.

As I said at the beginning, the situation is different for the single-table mode and we should not implement what I'm proposing there.

tdixon97 · 2024-11-11T10:57:18Z

I'm a bit confused, but I think that avoiding introducing extra IDs or names for the detectors is essential.
Maybe storing a string is not actually not bad since it should compress very well

ManuelHu · 2024-11-11T12:53:00Z

Maybe storing a string is not actually not bad since it should compress very well

I did a quick test and stored the same 100 byte long constant string for each vertex, with HDF5 output. The file size difference was ~ 72 MB (for 60 MB of additional string data, so it did not compress at all, it is actually even worse!). Also the runtime was increaed by a factor of 8.5.
So no, storing strings for every simulation hit will not be good :-(

tdixon97 · 2024-11-11T22:53:09Z

Ok, I think we should not move to a single flat table since writing a column thats another ID (not meaning anything) is confusing and processing the flat table is not easier. It should be fairly easy to combine in post-proccesing the table so Id stick with @gipert suggestion...

gipert mentioned this issue Nov 10, 2024

Storing information for post-proc legend-exp/legend-pygeom-l200#59

Closed

gipert added the output Output Schemes label Nov 10, 2024

ManuelHu mentioned this issue Nov 12, 2024

output: use detector names as ntuple name #161

Merged

ManuelHu closed this as completed Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to directly use sensitive detector names to name output tables #153

Add option to directly use sensitive detector names to name output tables #153

gipert commented Nov 10, 2024

ManuelHu commented Nov 10, 2024

gipert commented Nov 11, 2024

ManuelHu commented Nov 11, 2024

gipert commented Nov 11, 2024

ManuelHu commented Nov 11, 2024

gipert commented Nov 11, 2024

tdixon97 commented Nov 11, 2024

ManuelHu commented Nov 11, 2024

tdixon97 commented Nov 11, 2024

Add option to directly use sensitive detector names to name output tables #153

Add option to directly use sensitive detector names to name output tables #153

Comments

gipert commented Nov 10, 2024

ManuelHu commented Nov 10, 2024

gipert commented Nov 11, 2024

ManuelHu commented Nov 11, 2024

gipert commented Nov 11, 2024

ManuelHu commented Nov 11, 2024

gipert commented Nov 11, 2024

tdixon97 commented Nov 11, 2024

ManuelHu commented Nov 11, 2024

tdixon97 commented Nov 11, 2024