TRestDataSet and TRestDiscrimination implementation #13

jgalan · 2021-01-11T08:29:40Z

We need to define new classes that allow us to create datasets. A dataset is a class that allows to define some rules and a period of time to filter a set of ROOT files in a path containing all the relevant data. A dataset will contain rules to define background files, and rules to define calibration files. Obviously those datasets could have been generated using Monte Carlo or real data. A dataset, therefore, will identify which calibration data is used for pattern recognition for each background population.

TRestDataSet will use RDataFrame to quickly define a glob pattern that identifies the files, and require that they fulfil certain metadata conditions at any of the classes stored inside the ROOT files. This will create a dataset, i.e. a combined TTree (or RDataFrame) with all the matching files.

This might need a bit of additional brainstorming, however, TRestDataSet should at least:

At initialisation it should define the metadata rules for accepting a ROOT file or not (i.e. a subrun).
It should keep track of the total time (duration) of all the files added.
Define a method to set the variables that will be found at final tree (that we will be able to export later on). I.e: Int_t SetObservables( std::string ).
Implement a method to define which variables will be used as discriminant. I.e: Int_t SetDiscriminants( std::string ).
A method to define a range in each observable. SetDiscriminantLimits( Double_t efficiency);. This method will define a big multi-dimensional box and it will start to reduce its limits in order to find the optimum box that keeps the fraction of the calibration events given by efficiency inside.

Another class named TRestDiscrimination will define an interface with pure abstract classes to define the methods that MUST be implemented in the inherited class. The TRestDiscrimination will contain a vector of TRestDataSets.

TRestDiscrimination will allow us to add any number of datasets. And call some methods to retrieve the final background population selected.

This is still to be reflected a bit, however, TRestDiscrimination should at least implement:

A pure virtual method to get the resulting population after applying the cuts at the particular method: RDataRFrame GetSelection( );.
A pure virtual method to get the optimum efficiency that maximises the signal-to-background relation: GetOptimumEfficiency

A first specific discrimination class will be TRestBasicDiscrimination. This class will perform an event selection following a basic rectangular cuts scheme with an optimum for the dataset.

There is a working document at the following location to be used as a draft:

https://docs.google.com/presentation/d/1gQPMpR-wcQgLzKMNVgdXvgVSXPyQOhbirpx_vcSL1vo/edit?usp=sharing

As soon as this issue is open do not hesitate to participate or contribute to this topic! New contributors are welcome to join discussions or development.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TRestDataSet and TRestDiscrimination implementation #13

TRestDataSet and TRestDiscrimination implementation #13

jgalan commented Jan 11, 2021 •

edited

Loading

TRestDataSet and TRestDiscrimination implementation #13

TRestDataSet and TRestDiscrimination implementation #13

Comments

jgalan commented Jan 11, 2021 • edited Loading

jgalan commented Jan 11, 2021 •

edited

Loading