Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganisation of the MC data directory tree #42

Open
Voutsi opened this issue Apr 20, 2022 · 11 comments
Open

Reorganisation of the MC data directory tree #42

Voutsi opened this issue Apr 20, 2022 · 11 comments

Comments

@Voutsi
Copy link
Collaborator

Voutsi commented Apr 20, 2022

Currently the data of the training dataset are stored at:
/home/georgios.voutsinas/ws/AllSky/TrainingDataset
there are 2 directories, one for protons and one for gamma diff. For each particle type, we have a directory per declination band (exception is the Crab's band which are stored simply in directories called Corsika & sim_telarray - I will move them this WE to a dir called dec_2276). Each declination's band directory splits to a Corsika and a sim_telarray dir, and in each one of this dirs we have a directory per node.

The structure is illustrated in the example directory tree attached below.

data_dir_tree

Please let me know if this scheme is satisfactory or we should organise the data in a more optimal way.

@moralejo
Copy link
Collaborator

Thanks, looks good to me. Other opinions?

@maxnoe
Copy link
Member

maxnoe commented Apr 21, 2022

@Voutsi In your screenshot I see that you are using the gzip compression for the simtel files.

We should use zstd, it is slightly faster to write, a bit smaller and much faster to read. It is also what is used for standard CTA productions.

@Voutsi
Copy link
Collaborator Author

Voutsi commented Apr 21, 2022

Thanks @maxnoe , I was not aware of that. As long as the pipeline can digest .zstd files I agree we should change.

@rlopezcoto
Copy link
Collaborator

@Voutsi thanks, I agree with the proposed organization, but according to Daniel Mazin's comment from today, you may be having troubles writing all the MC in your home folder, shall we star transferring them elsewhere in the organization tree?

@Voutsi
Copy link
Collaborator Author

Voutsi commented Apr 21, 2022

@rlopezcoto the MC is stored in fefs:

/fefs/aswg/workspace/georgios.voutsinas/AllSky/

I have a symlink in my home folder pointing to the storage space at fefs and this is what I showed in the slides today (I agree it was confusing...)

Or you mean that I will have problems to store it in /fefs/aswg/workspace/georgios.voutsinas/ also?

@rlopezcoto
Copy link
Collaborator

rlopezcoto commented Apr 21, 2022

no, in the workspace folder it should be fine, no limits there so far

@jsitarek
Copy link
Collaborator

since those will be the first really official MCs to be used by many analyzers, maybe it would be good to put the main path specifying that it is LSTProd2
something like
/fefs/aswg/mc/LSTProd2/...
(to mimic what is done for the data)
actually there is a directory /fefs/aswg/data/mc with some old 2020/2021 files, those can be moved away and .../data/mc could be used as well (but I think it is more confusing than just ...aswg/mc

another thing, while corsika is expected to be just one directory, sim_telarray will be run multiple times with various settings, so I think it would be good to add to "sim_telarray" some tags describing the time period for which they are produced (dates or analysis periods), and settings ("nominal", "low_NSB" or something like this).

@Voutsi
Copy link
Collaborator Author

Voutsi commented Apr 26, 2022

Hi @jsitarek sounds good to me, so I create a /fefs/aswg/mc/LSTProd2/, create the same directory structure, and then sym-linking data files, configs & logs.

Sure, I can add a suffix in the sim_telarray directories. I understand that now we produce the nominal ones.

@Voutsi
Copy link
Collaborator Author

Voutsi commented May 5, 2022

Hi @maxnoe zstd is not installed and I don't have the privileges to do it. Shall we request the admins or someone can install it?

@maxnoe
Copy link
Member

maxnoe commented May 6, 2022

@Voutsi To have it in the system, yes ask the admins. It's however also available in the lstchain conda environments already.

@vuillaut
Copy link
Member

vuillaut commented May 6, 2022

Some of this discussion happened in emails, my bad I missed the discussion in this repo !
So I will duplicate what I wrote previously here.
These are only my thoughts from what worked in the previous prods, I may be missing (technical) points, so take everything as suggestions.

  1. Please symlink all (not only Test) productions under /fefs/aswg/data/mc/DL0

    • the DL0 is not entirely accurate I agree, this is what we had been using until now but can be changed
    • then the other data levels will follow the same structure so it's easy for users to understand
  2. In the symlinked structure, I would argue that it should be as simple as possible, removing intermediate single directories (e.g. sim_telarray, output...), and presumably log and job files, corsika files, etc...

  3. different MC settings should lead to a different "MC prod ID" - much like 20200629_prod5_trans_80 with all produced files using these settings under that dir

    • thus not having loose and not very self-explanatory v1.4 lower in the tree structure
  4. use the same nomenclature everywhere for similar things - it will be clearer and help parsing

    • for example node_theta_xx_az_yy vs node_corsika_theta_xx_az_yy

A final structure could look like this:

/fefs/aswg/data/mc/DL0
  └── allsky_v1.4_trig_xxx_trans_zzz
      ├── Testing
          ├── node_theta_xx_az_yy
          ├── node_theta_xx_az_yy
          └── node_theta_xx_az_yy
      └── Training
          ├── GammaDiffuse
          │   ├── dec_2276
          │   └── dec_3476
          │       ├── node_theta_xx_az_yy
          │       └── node_theta_xx_az_yy
          └── Protons
            ├── dec_2276
            └── dec_3476
                  ├── node_theta_xx_az_yy
                  └── node_theta_xx_az_yy

EDIT: Georgios made me realise declination should be lower in the tree so I edited the example accordingly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants