[BIG UPDATE] Package manager, build.py, data IO using STL and Neuland time calibration rewrite #890

YanzhaoW · 2023-09-06T12:40:30Z

This PR contains lots of new features. Here is the list:

Implementing Conan package manager for R3BRoot.
Just like Pip or Conda for python, Conan is the most popular package manager for C++ project. It has seamless integration with cmake without the need to setup any environment variables. Its main difference to spack (which is used by FairRoot) is Conan is a project-wise package manager, which means different projects could have different packages without any interference between each other. To add a library for the project, just add the name and version to the conanfile.txt in the root directory. Currently there are 4 libraries being added:
- ms-gsl/4.0.0
- range-v3/0.12.0
- gtest/cci.20210126
- nlohmann_json/3.11.2
Adding a build script build.py to manage all the build processes for R3BRoot.
To install the conan packages, use ./build.py -p. To configure cmake, use ./build.py -c. And to build the project, use ./build.py -b. To do all: ./build.py -a. For more specification, check ./build.py -h
Completely getting rid of TClonesArray for data input and output.
All the data read or written with ROOT TTree has STL types. For example, Neuland mapped data is stored as std::vector<R3B::PaddleTamexMappedData> and trigger mapped data is stored as std::map<unsigned int, R3B::PaddleTamexTrigMappedData>.
Add an algorithm to automatically detect Neuland TrigIDMap.
It can later be saved into a JSON file. If disabled, the TrigIDMap need to be read from a JSON file. JSON file reading and writing is implemented using nlohmann's JSON library.
No usage of index arithmetic for data analysis.
Rather, use range-v3 library to construct the data binding and data filtering.
Following classes are rewritten as new classes (with type aliases):
- R3BPaddleTamexMappedData -> R3B::PaddleTamexMappedData(R3BPaddleTamexMappedData2)
- R3BTCalPar -> R3B::TCalPar (R3BTCalPar2)
- R3BNeulandTamexReader -> R3BNeulandTamexReader2
- R3BNeulandMapped2CalPar -> R3B::Neuland::Mapped2CalPar(R3BNeulandMapped2CalPar2)
- R3BNeulandMapped2Cal -> R3B::Neuland::Mapped2Cal (R3BNeulandMapped2Cal2)
Place all the new implementation in a new folder neuland/calibration/calLevel/
Nearly all old code stay untouched.

TODO:

Adding documentation for the Neuland time calibration process.
Adding unit tests.

It's no hurry to merge this PR to the dev branch. Feel free the leave comments below if there are any suggestions. In the meanwhile, I will try to finish the energy calibration rewriting in few weeks.

Checklist:

Rebased against dev branch
My name is in the resp. CONTRIBUTORS/AUTHORS file
Followed the pull request guidlines and the Git workflow
Followed the seven rules of great commit messages

YanzhaoW · 2023-09-08T12:32:17Z

Hi @kresan, could you give this PR a review about the build.py, a build script for R3BRoot, and using parameters without parameter factories?

klenze · 2023-09-09T13:29:37Z

neuland/calibration/calLevel/R3BFTCalEngine.cxx

+        using ValueErrors = FTCalStrategy::ValueErrors;
+        constexpr unsigned int uniform_err_divider = 12;
+        constexpr double uniform_err_divider_sqrt = 3.464101615137755;
+        constexpr double sqrt_3 = 1.732050807568877;


Why do you initialize that constant with a fixed value?
Is it because std::sqrt is not constexpr (until C++26?) and you want to save the extra calculation at runtime?
I do not think that two sqrt calls per run are going to affect runtime that much. Some readers like me will have limited math ability and not be able to verify in their head that 3.464101615137755 is in fact the double constant best approximating sqrt(12). This makes it harder to verify that the code does what it claims to do.

Yes, exactly. std::sqrt is not constexpr and sqrt function is actually slow compared to other math functions. The thing is the calling time of sqrt is not two calls per run but is equal to the "number of digitizers" times "number of channels", which is around 2600 x 4 x 500. That's a lot. So I think maybe just make it as a constexpr value without calling sqrt. And for now the calibration is done in the end of the run. Maybe in the future we could more calibrations in a single run.

EDIT: I think it twice. This should be better:

const auto uniform_err_divider_sqrt = std::sqrt(12); const auto sqrt_3 = std::sqrt(3);

Some readers like me will have limited math ability and not be able to verify in their head that 3.464101615137755 is in fact the double constant best approximating sqrt(12).

That's true. Naming here is not very intuitive. It could just be SQRT_12. But I thought the reason why here is the square root of 12 is because of standard deviation from uniform distribution. Now I think SQRT_12 is easier to understand.

Ah, I guess I will change back to constexpr. If I use const auto sqrt_12 = std::sqrt(12), I have to change all const variables that use sqrt of 12. Actually someone already implemented it long time ago:

R3BRoot/neuland/shared/R3BNeulandCommon.h

Line 25 in b0f85c9

constexpr auto __sqrt12 = 3.464101615;

I will probably use this value instead (and probably change it the name to SQRT_12 since double underscore prefix is reserved for C++ internal global variable).

klenze · 2023-09-09T15:31:00Z

Package Manager

From my understanding, the conan dependencies would not be automatically installed if people continued to run cmake, which is why you added the build.py.

I would be reluctant to wrap the top level of the build process. cmake and make both take a zillion optional flags, so it is worthwhile for users to know how to run them manually so that they have the option to pass some of these flags.

I am also doubtful that our lead developers will be easily sold on adding an extra step in the build process. Fetching external packages from cmake, while not particularly elegant, seems more likely to succeed.

External dependencies

One of the main benefits of add external dependencies in cmake is that it is painful, which dissuades people from adding to many. I don't know much about C++ json libraries (I am more of a yaml fan), but it seems to me that we already get a json lib with boost. I have not used boost json, it could be an utter horror to work with for all I know. But if it is reasonable, I would much prefer to use a boost library to an external dependency, even if it is less shiny. The reason is that someone might have to compile this code in a decade, and I am not sure if nlohmann_json or conan will still be maintained by then. (If boost is not maintained by then, that is FairSoft's problem).

With gtest, my understanding is that FairSoft ships with a version, but you would like a more recent version?

Class Design
I had a look at MappedDataReader.
I appreciate you trying to separate the ugly FairRoot glue code from code which does actual useful work, but I don't think I am convinced your model can scale into a general solution to the problem of handling FairRootManager. (Of course, you never claimed that, so this is a bit like "I just bought some groceries" - "You fool, that will never scale to solve world hunger".)

In theory, that class could be templated over MappedData to generate a reader for any r3b data container. However, this won't work to well. The main problem is that as soon as you need a task which reads from two different data containers, you would need multiple inheritance. Of course, an instance of Reader<T> can not provide a Method Get##T, because only the preprocessor can concat token strings, so it would at most provide a method T GetContainer(T* dummy). This in turn will fail if a class needs two instances (with different runtime string names in FRM) of the same data type, say WRTS from both MainDAQ and Neuland.
I guess if one found a way to smuggle strings into template parameters (lambda expressions do work), one could use template packs to do something like

class MainNeulandWRTSDifference: 
public TemplateReader<DataInstance<WRTSData, "MainWRTS">, DataInstance<WRTSData, "NeulandWRTS">> 
{
void Exec(...)
{
   auto mainWrts=GetContainer<DataInstance<WRTSData, "MainWRTS">>();
   auto mainWrts=GetContainer<DataInstance<WRTSData, "NeulandWRTS">>();
   ...
}
};

What I would instead prefer is the following:

class MainNeulandWRTSDifference : public BetterTask
{
    auto fWrtsMain=FetchData<WRTSData>("MainWrts");
    auto fWrtsNL=FetchData<WRTSData>("MainWrts");
    auto fDiff=RegisterOutput<WrtsDifferenceData>("MainNLWrtsDiff");
    auto fSomePar=FetchParameter<WrtsClock>("WrtsClock");
...
};

Because the objects can not actually be fetched when the class members are initialized, accessing them will involve an additional level of indirection. (The alternative is the ugly syntax of auto fWrtsMain=FetchData<WRTSData>(fWrtsMain, "MainWrts"); which would allow BetterTask to store a reference to fWrtsMain and assign it at an appropriate time.)

The implementation of BetterTask::FetchData will of course be a bit tricky. I would use a

std::vector<std::function<void(FairRootManager*)> fOnInit;
template<class T>
T* FetchData(T*& target std::stringview name)
{
    fOnInit.push_back(
       [name&, target&](FairRootManager* frm)
       {
           target=frm->InitObjectAs<T>(name);
           // error handling
       });
    return nullptr;
}

(Ugly fast version without indirect reference, the other version would require more memory storage. Of course, it is probably possible to have a simple pointer-wrapping class which overloads operator= and copy constructor and tells BetterTask where the pointer it needs to update is stored. This would be actually neatly byzantine.)

Perhaps I will get around to writing a proof of concept for that at some point.

YanzhaoW · 2023-09-09T16:31:05Z

From my understanding, the conan dependencies would not be automatically installed if people continued to run cmake, which is why you added the build.py.

No really. The conan dependencies are only needed to be installed for the first time with:

conan install . --output-folder=build --build=missing

And then when you run with cmake, you need to add a cmake flag to let cmake know where are packages:

cmake ..  -DCMAKE_TOOLCHAIN_FILE=Dir/conan_toolchain.cmake

For the second run or third run, you could just run cmake .. since the package path is cached already. This is what is really nice about conan: it's not intrusive. You don't need to add or change anything in your cmake script to use conan. If conan is not support anymore (which is highly unlikely), we could just use another one without changing cmake script. The reason why there is a build.py is that I want to simplify the process when the project is built for first time. Actually it's also very simple after the first time build.

But you are right about this point. Use ./build.py -a is different from the traditional workflow mkdir build && cd build && cmake .. && make. However, a build script for a large C++ project is indeed very common. Some people choose make script, some choose bash script and some choose python script. All is just to make build process user friendly.

That being said, there is an ongoing development for conan2 (see this), which allow cmake automatically install all conan packages and cache the path. It's already done for conan1 and now they are trying to make it compatible with conan2.

I am also doubtful that our lead developers will be easily sold on adding an extra step in the build process. Fetching external packages from cmake, while not particularly elegant, seems more likely to succeed.

I wouldn't say they are fetched from cmake. There is a Fetch content function in cmake, which everyone in C++ community is complaining about. Conan, on the other hand, is similar to Spack they are using for FairSoft. The only difference is Conan is much much nicer to integrate with cmake than Spack. With Spack, you basically need to set the PATH env variable and tell cmake to go to find the Config.cmake with those paths. If someone set PATH differently, then it's screwed. But with Conan, cmake just need to use conan_toolchain.cmake, which specifies the package paths and the cmake targets directly. It's hardly surprising since Spack is more like dnf or apt which can install packages in many languages systemwide. Conan is the package manager only for C++.

I don't know much about C++ json libraries (I am more of a yaml fan), but it seems to me that we already get a json lib with boost. I have not used boost json, it could be an utter horror to work with for all I know.

Yes, I heard about boost JSON library. There is also another JSON library called simdjson. Among these three libraries, I personally think nlohmann_json is probably most easy to use. It has super nice API and the json data structure is much like any STL containers. Maybe I will also try boost.json. Since with a package manager, changing into a different library is very easy.

I wound't say yaml is very suitable for storing data. It's more like for configuration, instead of data storage. Actually I tried to use yaml to configure neuland executable using yaml-cpp (see here). But since then I just keep thinking: maybe python or lua is better to configure C++ project? Anyway, C++ JSON library is much more mature than yaml library and if you see this reddit post, most people use JSON.

YanzhaoW · 2023-09-09T16:58:27Z

I appreciate you trying to separate the ugly FairRoot glue code from code which does actual useful work, but I don't think I am convinced your model can scale into a general solution to the problem of handling FairRootManager.

Well, there wasn't MappedDataReader in the beginning. Later I found there are a lot of duplicated code between R3BNeulandMapped2Cal and R3BNeulandMapped2CalPar. Both need to read the mapped data, one for the calculation of parameters and the other for calculation of cal level data. Trying to follow the best practise, duplication is "evil". So in the end I just created a class to eliminate all duplications. This class is definitely not to solve the boilerplate problem from FairRootManager.

Because the objects can not actually be fetched when the class members are initialized, accessing them will involve an additional level of indirection.

I don't know how data is read from ucesb. But with TTree, all it does in each loop is to change the value of the variable, that has been already initialised. I have no idea why it's so complicated with FairRootManger ( still need some time to figure out what's going on there).

class MainNeulandWRTSDifference : public BetterTask
{
auto fWrtsMain=FetchData("MainWrts");
auto fWrtsNL=FetchData("MainWrts");
auto fDiff=RegisterOutput("MainNLWrtsDiff");
auto fSomePar=FetchParameter("WrtsClock");
...
};

I think this way is the best. The only change I would make is to assign the fetched data to a member variable rather than local variable. And I also think it's best to just let task own their own fetched data, instead of registering all the data somewhere else and delete them in the end.

klenze · 2023-09-10T00:29:34Z

RE neatly byzantine: https://gist.github.com/klenze/34907b55b9bcf1b1401a9af1005a5064
(I would consider this moderately evil. Real evilness would be to rely on return value optimization).
Also, I have decided that FairerTask is a nicer name than BetterTask :-)

kresan · 2023-09-11T11:28:49Z

I understand what Conan package manager can do, but why do you need this for R3BRoot? To handle 4 dependencies?

Concerning the container factories, can you boil down your 3.400 lines of changes to the short description of how do you access parameter containers without factories?

YanzhaoW · 2023-09-11T12:09:06Z

I understand what Conan package manager can do, but why do you need this for R3BRoot? To handle 4 dependencies?

I would say to handle dependencies easier and more flexible. It's very necessary to have a package manager for the project itself rather than waiting for the library updates from upstream (in our case, FairSoft). If we have a package manager locally, using a new library becomes super easy. Just add a line in conanfile.txt and it's done immediately. The other option is wait for FairSoft to add the library, which could take quite a long time.

Concerning the container factories, can you boil down your 3.400 lines of changes to the short description of how do you access parameter containers without factories?

Sure, please look at the function MappedDataReader::SetParContainers. Here I just use addContainer from FairRuntimeDb (sorry I actually meant FairRuntimeDb when I said FairRootManager previously):

if (auto* rtdb = FairRuntimeDb::instance(); rtdb != nullptr)
        {
            calibrationPar_ = std::make_unique<TCalPar>(calibrationParName_.c_str()).release();
            calibrationTrigPar_ = std::make_unique<TCalPar>(calibrationTrigParName_.c_str()).release();
            rtdb->addContainer(calibrationPar_);
            rtdb->addContainer(calibrationTrigPar_);
            if (calibrationPar_ == nullptr and calibrationTrigPar_ == nullptr)
            {
                throw R3B::runtime_error("calibration parameters are still nullptr!");
            }
            SetExtraPar(rtdb);
        }

Here is the function definition from addContainer:

Bool_t FairRuntimeDb::addContainer(FairParSet* container)
{
    auto name = container->GetName();

    if (!containerList->FindObject(name)) {
        containerList->Add(container);
        TIter next(runs);
        FairRtdbRun* run;
        FairParVersion* vers;
        while ((run = static_cast<FairRtdbRun*>(next()))) {
            if (!run->getParVersion(name)) {
                vers = new FairParVersion(name);
                run->addParVersion(vers);
            }
        }
        // cout << "-I- RTDB entries in list# " <<  containerList->GetEntries() <<"\n" ;

        return kTRUE;
    }

    Warning("addContainer(FairParSet*)", "Container %s already exists!", name);
    return kFALSE;

}

Here you could see it has nothing to do with containerFactory. All it does is to add a container to the containList.

And what we usually do with a parameter (a container) is to use another function called getContainer(), which internally checks the name from the container factories, create the container and call addContainer(). Of course, in this case, the real name of the container is not the string we give to getContainer but concatenated with another string called "context". I certainly don't think this is completely useless. But in many cases when people just need to create a single instance of a parameter, using a container (aka a parameter) factory is unnecessary.

klenze · 2023-09-12T18:23:56Z

Re FairerTask, it turns out that C++ disallows auto for member variables. I have no idea why.
The best I can do is

Ptr<const R3BCalifaCrystalCalData::container_t>  fCalifaMappedData=AddInput(fInputName);

Even that is tricky. You can not match template match on return types.
However, you are allowed to have a class which defines a templated type conversion operator.
Through that operator one can learn the desired return type.

This is of course a hack. I don't think there is a very neat solution:
Either you replace the variable declaration by a preprocessor call, or duplicate the type declaration, or abuse type conversion operators.
What C++ generally lacks is a high level metalanguage. The preprocessor (which works on strings) is terrible. I think Rust is better in this regard.

YanzhaoW · 2023-09-13T08:38:08Z

Ptr<const R3BCalifaCrystalCalData::container_t> fCalifaMappedData=AddInput(fInputName);

Maybe you could default initialise variable first and then AddInput to FairRootManager in Init() function:

private:
    Ptr<const R3BCalifaCrystalCalData::container_t>  fCalifaMappedData;

auto Init()
{
    AddInput(fInputName, fCalifaMappedData);
}

With this the type read from the tree can be deduced by the input parameter type.

Or better way could be:

private:
    Ptr<const R3BCalifaCrystalCalData::container_t>  fCalifaMappedData = "default_input_string";

auto Init()
{
    fCalifaMappedData.Init();
}

kresan · 2023-09-13T12:59:09Z

@YanzhaoW Sure, if you create parameter containers and add them to the list yourself, then you do not need container factory.

If you decide to handle external software within R3BRoot and to (partially) replace FairSoft with a package manager, you (R3B) will need to maintain it.

YanzhaoW · 2024-08-14T16:29:06Z

This PR will be divided into multiple PRs with smaller sizes.

This was referenced Sep 6, 2023

Implementation of Parameter finder for Tprev/Tnext calculation #885

Merged

[WIP] A new type-safe and casting-free wrapper for TClonesArray: TCAViewer #764

Closed

klenze reviewed Sep 9, 2023

View reviewed changes

YanzhaoW mentioned this pull request Sep 19, 2023

[NEW FEATURE] R3BIOConnector #893

Merged

4 tasks

YanzhaoW force-pushed the edwinMap2CalPar branch from aaa045c to c9b9c91 Compare September 21, 2023 13:30

YanzhaoW marked this pull request as draft October 18, 2023 12:29

YanzhaoW force-pushed the edwinMap2CalPar branch from 251d202 to f629c60 Compare November 23, 2023 18:54

YanzhaoW mentioned this pull request Nov 23, 2023

R3BUcesbSource2 and add runtime struct info checking #910

Merged

4 tasks

YanzhaoW force-pushed the edwinMap2CalPar branch 4 times, most recently from b058638 to 5384b19 Compare November 29, 2023 19:46

YanzhaoW force-pushed the edwinMap2CalPar branch 2 times, most recently from c0bec7a to cb61dbc Compare January 8, 2024 16:40

YanzhaoW marked this pull request as ready for review January 8, 2024 18:29

YanzhaoW force-pushed the edwinMap2CalPar branch 2 times, most recently from bbd8f47 to 3cfea20 Compare January 9, 2024 16:33

YanzhaoW added 3 commits January 15, 2024 16:48

change to modern cmake

295a046

add gnu install

55943fa

Conan package manager, build.py and Neuland calibration rewrite

8065448

rebase on cmake change

af1cd72

YanzhaoW force-pushed the edwinMap2CalPar branch from f4ecd74 to af1cd72 Compare January 15, 2024 23:50

YanzhaoW added 23 commits January 16, 2024 23:48

update executables

4a08e0f

create base par

28d5d5d

add mille writer

79e1993

add millepede

efc4254

update for experiments

ce791d9

add save histograms to monitor

117de80

change to fairroot lib path

c2ccbad

rm mbdapi

831e481

rm mbdapi

0a6d16c

add TGraph suppport for DataMonitor

65f6f7b

add view range for count rate (but doesn't work)

9b4685c

fix compiler warnings

9c129d5

add steer writer and rankchecker

5907a45

complete result reader and tgraph

6b89adf

add pede launcher

7c4c4d7

add mean value for t_sum

fb6057e

add ifdef on source file

35e76e1

change to all trigs

4c16e5a

fix zero div

b73d1f7

add conan provider submodule

aeffa3f

change submodule name

de8ef63

last commit for tsum together with tdiff

0fe3281

going back to three parameters

561b746

YanzhaoW marked this pull request as draft May 18, 2024 15:58

YanzhaoW mentioned this pull request Aug 14, 2024

Adding conan package manager (Opt-in) #1077

Merged

4 tasks

YanzhaoW closed this Aug 14, 2024

YanzhaoW mentioned this pull request Aug 23, 2024

R3BDataMonitor, R3BValueError and Update on FileSource2 #1078

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BIG UPDATE] Package manager, build.py, data IO using STL and Neuland time calibration rewrite #890

[BIG UPDATE] Package manager, build.py, data IO using STL and Neuland time calibration rewrite #890

YanzhaoW commented Sep 6, 2023 •

edited

Loading

YanzhaoW commented Sep 8, 2023

klenze Sep 9, 2023

YanzhaoW Sep 9, 2023 •

edited

Loading

YanzhaoW Sep 11, 2023 •

edited

Loading

klenze commented Sep 9, 2023

YanzhaoW commented Sep 9, 2023

YanzhaoW commented Sep 9, 2023 •

edited

Loading

klenze commented Sep 10, 2023

kresan commented Sep 11, 2023

YanzhaoW commented Sep 11, 2023 •

edited

Loading

klenze commented Sep 12, 2023

YanzhaoW commented Sep 13, 2023 •

edited

Loading

kresan commented Sep 13, 2023

YanzhaoW commented Aug 14, 2024

[BIG UPDATE] Package manager, build.py, data IO using STL and Neuland time calibration rewrite #890

[BIG UPDATE] Package manager, build.py, data IO using STL and Neuland time calibration rewrite #890

Conversation

YanzhaoW commented Sep 6, 2023 • edited Loading

YanzhaoW commented Sep 8, 2023

klenze Sep 9, 2023

Choose a reason for hiding this comment

YanzhaoW Sep 9, 2023 • edited Loading

Choose a reason for hiding this comment

YanzhaoW Sep 11, 2023 • edited Loading

Choose a reason for hiding this comment

klenze commented Sep 9, 2023

YanzhaoW commented Sep 9, 2023

YanzhaoW commented Sep 9, 2023 • edited Loading

klenze commented Sep 10, 2023

kresan commented Sep 11, 2023

YanzhaoW commented Sep 11, 2023 • edited Loading

klenze commented Sep 12, 2023

YanzhaoW commented Sep 13, 2023 • edited Loading

kresan commented Sep 13, 2023

YanzhaoW commented Aug 14, 2024

YanzhaoW commented Sep 6, 2023 •

edited

Loading

YanzhaoW Sep 9, 2023 •

edited

Loading

YanzhaoW Sep 11, 2023 •

edited

Loading

YanzhaoW commented Sep 9, 2023 •

edited

Loading

YanzhaoW commented Sep 11, 2023 •

edited

Loading

YanzhaoW commented Sep 13, 2023 •

edited

Loading