Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to directly use sensitive detector names to name output tables #153

Closed
gipert opened this issue Nov 10, 2024 · 9 comments
Closed
Labels
output Output Schemes

Comments

@gipert
Copy link
Member

gipert commented Nov 10, 2024

This would simplify detector identification in post-processing (when e.g. querying metadata). An exception should be thrown if the names are not all unique and if the user requests to have all data in one big table.

@ManuelHu
Copy link
Collaborator

For the table mode per detector type we would still need to use the uids, but we could add a mapping table

@gipert
Copy link
Member Author

gipert commented Nov 11, 2024

For the table mode per detector type we would still need to use the uids

I don't see why other than having to change the code a bit...

@ManuelHu
Copy link
Collaborator

To be more clear what I mean: the code of an ntuple "per detector type" contains a column with the uids (now). If we would replace them with the names, we would have a string-typed column; that might be bad for storage size and performance...

@gipert
Copy link
Member Author

gipert commented Nov 11, 2024

Oh I understand. Mabye we can just get rid of it?

@ManuelHu
Copy link
Collaborator

This "flat" output mode is exactly what moritz requested in #85 , so probably not.

@gipert
Copy link
Member Author

gipert commented Nov 11, 2024

Sorry, I misunderstood again. In the multi-table output mode we don't store a column with the UID, so it would not use more disk space. We could change the code to generate a random UID to be used internally and then name the table in the output after the sensitive volume name.

As I said at the beginning, the situation is different for the single-table mode and we should not implement what I'm proposing there.

@tdixon97
Copy link
Collaborator

I'm a bit confused, but I think that avoiding introducing extra IDs or names for the detectors is essential.
Maybe storing a string is not actually not bad since it should compress very well

@ManuelHu
Copy link
Collaborator

Maybe storing a string is not actually not bad since it should compress very well

I did a quick test and stored the same 100 byte long constant string for each vertex, with HDF5 output. The file size difference was ~ 72 MB (for 60 MB of additional string data, so it did not compress at all, it is actually even worse!). Also the runtime was increaed by a factor of 8.5.
So no, storing strings for every simulation hit will not be good :-(

@tdixon97
Copy link
Collaborator

Ok, I think we should not move to a single flat table since writing a column thats another ID (not meaning anything) is confusing and processing the flat table is not easier. It should be fairly easy to combine in post-proccesing the table so Id stick with @gipert suggestion...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
output Output Schemes
Projects
None yet
Development

No branches or pull requests

3 participants