-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BIG UPDATE] Package manager, build.py, data IO using STL and Neuland time calibration rewrite #890
Conversation
Hi @kresan, could you give this PR a review about the build.py, a build script for R3BRoot, and using parameters without parameter factories? |
using ValueErrors = FTCalStrategy::ValueErrors; | ||
constexpr unsigned int uniform_err_divider = 12; | ||
constexpr double uniform_err_divider_sqrt = 3.464101615137755; | ||
constexpr double sqrt_3 = 1.732050807568877; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you initialize that constant with a fixed value?
Is it because std::sqrt is not constexpr (until C++26?) and you want to save the extra calculation at runtime?
I do not think that two sqrt calls per run are going to affect runtime that much. Some readers like me will have limited math ability and not be able to verify in their head that 3.464101615137755 is in fact the double constant best approximating sqrt(12). This makes it harder to verify that the code does what it claims to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly. std::sqrt is not constexpr and sqrt function is actually slow compared to other math functions. The thing is the calling time of sqrt is not two calls per run but is equal to the "number of digitizers" times "number of channels", which is around 2600 x 4 x 500. That's a lot. So I think maybe just make it as a constexpr value without calling sqrt. And for now the calibration is done in the end of the run. Maybe in the future we could more calibrations in a single run.
EDIT: I think it twice. This should be better:
const auto uniform_err_divider_sqrt = std::sqrt(12);
const auto sqrt_3 = std::sqrt(3);
Some readers like me will have limited math ability and not be able to verify in their head that 3.464101615137755 is in fact the double constant best approximating sqrt(12).
That's true. Naming here is not very intuitive. It could just be SQRT_12
. But I thought the reason why here is the square root of 12 is because of standard deviation from uniform distribution. Now I think SQRT_12
is easier to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I guess I will change back to constexpr. If I use const auto sqrt_12 = std::sqrt(12)
, I have to change all const variables that use sqrt of 12. Actually someone already implemented it long time ago:
R3BRoot/neuland/shared/R3BNeulandCommon.h
Line 25 in b0f85c9
constexpr auto __sqrt12 = 3.464101615; |
I will probably use this value instead (and probably change it the name to SQRT_12
since double underscore prefix is reserved for C++ internal global variable).
Package Manager From my understanding, the conan dependencies would not be automatically installed if people continued to run cmake, which is why you added the build.py. I would be reluctant to wrap the top level of the build process. cmake and make both take a zillion optional flags, so it is worthwhile for users to know how to run them manually so that they have the option to pass some of these flags. I am also doubtful that our lead developers will be easily sold on adding an extra step in the build process. Fetching external packages from cmake, while not particularly elegant, seems more likely to succeed. External dependencies One of the main benefits of add external dependencies in cmake is that it is painful, which dissuades people from adding to many. I don't know much about C++ json libraries (I am more of a yaml fan), but it seems to me that we already get a json lib with boost. I have not used boost json, it could be an utter horror to work with for all I know. But if it is reasonable, I would much prefer to use a boost library to an external dependency, even if it is less shiny. The reason is that someone might have to compile this code in a decade, and I am not sure if nlohmann_json or conan will still be maintained by then. (If boost is not maintained by then, that is FairSoft's problem). With gtest, my understanding is that FairSoft ships with a version, but you would like a more recent version? Class Design In theory, that class could be templated over MappedData to generate a reader for any r3b data container. However, this won't work to well. The main problem is that as soon as you need a task which reads from two different data containers, you would need multiple inheritance. Of course, an instance of
What I would instead prefer is the following:
Because the objects can not actually be fetched when the class members are initialized, accessing them will involve an additional level of indirection. (The alternative is the ugly syntax of The implementation of BetterTask::FetchData will of course be a bit tricky. I would use a
(Ugly fast version without indirect reference, the other version would require more memory storage. Of course, it is probably possible to have a simple pointer-wrapping class which overloads operator= and copy constructor and tells BetterTask where the pointer it needs to update is stored. This would be actually neatly byzantine.) Perhaps I will get around to writing a proof of concept for that at some point. |
No really. The conan dependencies are only needed to be installed for the first time with: conan install . --output-folder=build --build=missing And then when you run with cmake, you need to add a cmake flag to let cmake know where are packages: cmake .. -DCMAKE_TOOLCHAIN_FILE=Dir/conan_toolchain.cmake For the second run or third run, you could just run But you are right about this point. Use That being said, there is an ongoing development for conan2 (see this), which allow cmake automatically install all conan packages and cache the path. It's already done for conan1 and now they are trying to make it compatible with conan2.
I wouldn't say they are fetched from cmake. There is a Fetch content function in cmake, which everyone in C++ community is complaining about. Conan, on the other hand, is similar to Spack they are using for FairSoft. The only difference is Conan is much much nicer to integrate with cmake than Spack. With Spack, you basically need to set the PATH env variable and tell cmake to go to find the Config.cmake with those paths. If someone set PATH differently, then it's screwed. But with Conan, cmake just need to use conan_toolchain.cmake, which specifies the package paths and the cmake targets directly. It's hardly surprising since Spack is more like
Yes, I heard about boost JSON library. There is also another JSON library called simdjson. Among these three libraries, I personally think nlohmann_json is probably most easy to use. It has super nice API and the json data structure is much like any STL containers. Maybe I will also try boost.json. Since with a package manager, changing into a different library is very easy. I wound't say yaml is very suitable for storing data. It's more like for configuration, instead of data storage. Actually I tried to use yaml to configure neuland executable using yaml-cpp (see here). But since then I just keep thinking: maybe python or lua is better to configure C++ project? Anyway, C++ JSON library is much more mature than yaml library and if you see this reddit post, most people use JSON. |
Well, there wasn't MappedDataReader in the beginning. Later I found there are a lot of duplicated code between R3BNeulandMapped2Cal and R3BNeulandMapped2CalPar. Both need to read the mapped data, one for the calculation of parameters and the other for calculation of cal level data. Trying to follow the best practise, duplication is "evil". So in the end I just created a class to eliminate all duplications. This class is definitely not to solve the boilerplate problem from FairRootManager.
I don't know how data is read from ucesb. But with TTree, all it does in each loop is to change the value of the variable, that has been already initialised. I have no idea why it's so complicated with FairRootManger ( still need some time to figure out what's going on there).
I think this way is the best. The only change I would make is to assign the fetched data to a member variable rather than local variable. And I also think it's best to just let task own their own fetched data, instead of registering all the data somewhere else and delete them in the end. |
RE neatly byzantine: https://gist.github.com/klenze/34907b55b9bcf1b1401a9af1005a5064 |
I understand what Conan package manager can do, but why do you need this for R3BRoot? To handle 4 dependencies? Concerning the container factories, can you boil down your 3.400 lines of changes to the short description of how do you access parameter containers without factories? |
I would say to handle dependencies easier and more flexible. It's very necessary to have a package manager for the project itself rather than waiting for the library updates from upstream (in our case, FairSoft). If we have a package manager locally, using a new library becomes super easy. Just add a line in conanfile.txt and it's done immediately. The other option is wait for FairSoft to add the library, which could take quite a long time.
Sure, please look at the function if (auto* rtdb = FairRuntimeDb::instance(); rtdb != nullptr)
{
calibrationPar_ = std::make_unique<TCalPar>(calibrationParName_.c_str()).release();
calibrationTrigPar_ = std::make_unique<TCalPar>(calibrationTrigParName_.c_str()).release();
rtdb->addContainer(calibrationPar_);
rtdb->addContainer(calibrationTrigPar_);
if (calibrationPar_ == nullptr and calibrationTrigPar_ == nullptr)
{
throw R3B::runtime_error("calibration parameters are still nullptr!");
}
SetExtraPar(rtdb);
} Here is the function definition from addContainer: Bool_t FairRuntimeDb::addContainer(FairParSet* container)
{
auto name = container->GetName();
if (!containerList->FindObject(name)) {
containerList->Add(container);
TIter next(runs);
FairRtdbRun* run;
FairParVersion* vers;
while ((run = static_cast<FairRtdbRun*>(next()))) {
if (!run->getParVersion(name)) {
vers = new FairParVersion(name);
run->addParVersion(vers);
}
}
// cout << "-I- RTDB entries in list# " << containerList->GetEntries() <<"\n" ;
return kTRUE;
}
Warning("addContainer(FairParSet*)", "Container %s already exists!", name);
return kFALSE;
} Here you could see it has nothing to do with containerFactory. All it does is to add a container to the containList. And what we usually do with a parameter (a container) is to use another function called |
Re FairerTask, it turns out that C++ disallows auto for member variables. I have no idea why.
Even that is tricky. You can not match template match on return types. This is of course a hack. I don't think there is a very neat solution: |
Maybe you could default initialise variable first and then AddInput to FairRootManager in private:
Ptr<const R3BCalifaCrystalCalData::container_t> fCalifaMappedData; auto Init()
{
AddInput(fInputName, fCalifaMappedData);
} With this the type read from the tree can be deduced by the input parameter type. Or better way could be: private:
Ptr<const R3BCalifaCrystalCalData::container_t> fCalifaMappedData = "default_input_string"; auto Init()
{
fCalifaMappedData.Init();
} |
@YanzhaoW Sure, if you create parameter containers and add them to the list yourself, then you do not need container factory. If you decide to handle external software within R3BRoot and to (partially) replace FairSoft with a package manager, you (R3B) will need to maintain it. |
aaa045c
to
c9b9c91
Compare
251d202
to
f629c60
Compare
b058638
to
5384b19
Compare
c0bec7a
to
cb61dbc
Compare
bbd8f47
to
3cfea20
Compare
f4ecd74
to
af1cd72
Compare
This PR will be divided into multiple PRs with smaller sizes. |
This PR contains lots of new features. Here is the list:
Just like Pip or Conda for python, Conan is the most popular package manager for C++ project. It has seamless integration with cmake without the need to setup any environment variables. Its main difference to spack (which is used by FairRoot) is Conan is a project-wise package manager, which means different projects could have different packages without any interference between each other. To add a library for the project, just add the name and version to the conanfile.txt in the root directory. Currently there are 4 libraries being added:
build.py
to manage all the build processes for R3BRoot.To install the conan packages, use
./build.py -p
. To configure cmake, use./build.py -c
. And to build the project, use./build.py -b
. To do all:./build.py -a
. For more specification, check./build.py -h
All the data read or written with ROOT TTree has STL types. For example, Neuland mapped data is stored as
std::vector<R3B::PaddleTamexMappedData>
and trigger mapped data is stored asstd::map<unsigned int, R3B::PaddleTamexTrigMappedData>
.It can later be saved into a JSON file. If disabled, the TrigIDMap need to be read from a JSON file. JSON file reading and writing is implemented using nlohmann's JSON library.
Rather, use range-v3 library to construct the data binding and data filtering.
neuland/calibration/calLevel/
TODO:
It's no hurry to merge this PR to the dev branch. Feel free the leave comments below if there are any suggestions. In the meanwhile, I will try to finish the energy calibration rewriting in few weeks.
Checklist:
dev
branch