-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
12 4 8 new vars v1 #47
base: 12_4_8
Are you sure you want to change the base?
Conversation
Hi @angzaza many thanks for your PR! I will have a look later today at it. @AnnikaStein could you review it as well? |
Maybe one first thing I directly noticed: the commit history seems to contain a lot of older changes not made by you, can you try to make a cleaner PR that only contains your own changes? It is otherwise a bit hard to review |
it probably would also help with the branch conflicts |
Sure, I can do that! [General comment] |
Indeed after switching the target branch, neither conflicts nor commits from other people are left in the PR :) So thanks also from my side @angzaza ! I'll take a closer look and get back to you with specific comments later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this nice work @angzaza !
Although more detailed statements and examples are directly marked in the code, here is a summary:
[Style]
Mostly indentation & commented out code, especially for indentation I only commented on the first occurrence although there were many:
- For plugins/JetConstituentTableProducer.cc in L51,54,67,82,85,92,97,110,116-118,125,… 243ff., 326ff.,353,356,363f.
- python/addBTV.py L454
- python/addPFcands_cff.py L65,76,129,139
[Implementation itself, technical details & storage]
- Some comments on more robust calculation of discriminators, similar to what is done in NanoAOD, directly marked inline.
- Synchronize precision of floating point branches (10 so far for everything versus 20 for Muon)
- Cross-check variables w.r.t. RecoBTag-PerformanceMeasurements (where there seem to be more)
[Suggestion]
- Checked out this development branch and ran a quick test myself: currently, the Muon information is added for all Jet collections, FatJets included which is good, but I’m not sure we need this also for AK4, AK4GenJets, AK8GenJets? I’d suggest we add a boolean flag per jet collection which is read by the producer. Like so:
addMuonTable = cms.bool(False)
for those where we will definitely not add them, butaddMuonTable = cms.bool(True)
for FatJets, needs someif{}
clauses inside the producer then and could be handled similar toreadBtag
. Whether or not it should be added for more than just FatJets is probably something for discussion, can other commissioning workflows besides the one for fat jets make use of (Gen)JetMuon tables? Storage-wise it doesn’t look too heavy to include that also for the other collections as it’s not hundreds of features and half of them are integers.
@@ -48,19 +48,23 @@ class JetConstituentTableProducer : public edm::stream::EDProducer<> { | |||
//const std::string name_; | |||
const std::string name_; | |||
const std::string nameSV_; | |||
const std::string nameMu_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Style] Indentation level seems to vary for the newly added lines, compared to the surrounding statements
/*outMuons->clear(); | ||
jetIdx_mu.clear(); | ||
muIdx.clear(); | ||
muon_pt.clear(); | ||
muon_eta.clear(); | ||
muon_phi.clear(); | ||
muon_ptrel.clear();*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Style] Places with commented out code which could be removed for clarity
python/addBTV.py
Outdated
@@ -363,6 +449,7 @@ def add_BTV(process, runOnMC=False, onlyAK4=False, onlyAK8=False, keepInputs=['D | |||
extension=cms.bool(True), # this is the extension table for FatJets | |||
variables=cms.PSet( | |||
CommonVars, | |||
SubJetVars, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Style] Indentation level seems to vary for the newly added lines, compared to the surrounding statements
python/addPFCands_cff.py
Outdated
@@ -62,6 +62,8 @@ def addPFCands(process, runOnMC=False, allPF = False, onlyAK4=False, onlyAK8=Fal | |||
idx_name = cms.string("pFCandsIdx"), | |||
nameSV = cms.string("FatJetSVs"), | |||
idx_nameSV = cms.string("sVIdx"), | |||
nameMu = cms.string("FatJetMuons"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Style] Indentation level seems to vary for the newly added lines, compared to the surrounding statements
Especially here, this looked like an empty line and caused some confusion first
python/addBTV.py
Outdated
DeepCSVCvsLDisc=Var("? (bDiscriminator('pfDeepCSVJetTags:probc'))!=-1 ? bDiscriminator('pfDeepCSVJetTags:probc')/(bDiscriminator('pfDeepCSVJetTags:probc')+bDiscriminator('pfDeepCSVJetTags:probudsg')) : -1", | ||
float, | ||
doc="DeepCSV C vs L discriminator", | ||
precision=10), | ||
DeepCSVCvsBDisc=Var("? (bDiscriminator('pfDeepCSVJetTags:probc'))!=-1 ? bDiscriminator('pfDeepCSVJetTags:probc')/(bDiscriminator('pfDeepCSVJetTags:probc')+bDiscriminator('pfDeepCSVJetTags:probb')+bDiscriminator('pfDeepCSVJetTags:probbb')) : -1", | ||
float, | ||
doc="DeepCSV C vs B discriminator", | ||
precision=10), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Implementation itself, technical details & storage]
We might want to synchronize this check for invalid values with the statement in NanoAOD, example here: https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/NanoAOD/python/jetsAK4_Puppi_cff.py#L86-L87 —> switch from !=-1
to >=0
. Although that should not make that big of a difference, probably it has some effects related to floating point precision.
python/addBTV.py
Outdated
DeepCSVCvsLDisc=Var("? (bDiscriminator('pfDeepCSVJetTags:probc'))!=-1 ? bDiscriminator('pfDeepCSVJetTags:probc')/(bDiscriminator('pfDeepCSVJetTags:probc')+bDiscriminator('pfDeepCSVJetTags:probudsg')) : -1", | ||
float, | ||
doc="DeepCSV C vs L discriminator", | ||
precision=10), | ||
DeepCSVCvsBDisc=Var("? (bDiscriminator('pfDeepCSVJetTags:probc'))!=-1 ? bDiscriminator('pfDeepCSVJetTags:probc')/(bDiscriminator('pfDeepCSVJetTags:probc')+bDiscriminator('pfDeepCSVJetTags:probb')+bDiscriminator('pfDeepCSVJetTags:probbb')) : -1", | ||
float, | ||
doc="DeepCSV C vs B discriminator", | ||
precision=10), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Implementation itself, technical details & storage]
We might want to synchronize this check for invalid values with the statement in NanoAOD, example here: https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/NanoAOD/python/jetsAK4_Puppi_cff.py#L86-L87 —> switch from !=-1
to >=0
. Although that should not make that big of a difference, probably it has some effects related to floating point precision.
python/addBTV.py
Outdated
float, | ||
doc="combinedIVF", | ||
precision=10), | ||
DeepCSVBDisc=Var("bDiscriminator('pfDeepCSVJetTags:probb')+bDiscriminator('pfDeepCSVJetTags:probbb')", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Implementation itself, technical details & storage]
For the b discriminator (the sum of the individual b, bb related outputs) something similar to the CvsL/CvsB discriminator approach could be done to avoid getting irregular -2 bins: https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/NanoAOD/python/jetsAK4_Puppi_cff.py#L83 -> also add the ternary operator to test whether the sum is >=0 and assign either that sum or -1 finally.
python/addBTV.py
Outdated
float, | ||
doc="combinedIVF", | ||
precision=10), | ||
DeepCSVBDisc=Var("bDiscriminator('pfDeepCSVJetTags:probb')+bDiscriminator('pfDeepCSVJetTags:probbb')", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Implementation itself, technical details & storage]
Same comment as above
auto muonTable = std::make_unique<nanoaod::FlatTable>(outMuons->size(), nameMu_, false); | ||
// We fill from here only stuff that cannot be created with the SimpleFlatTnameableProducer | ||
muonTable->addColumn<int>("jetIdx", jetIdx_mu, "Index of the parent jet"); | ||
muonTable->addColumn<int>(idx_nameMu_, muIdx, "Index in the Muon list"); | ||
if (readBtag_) { | ||
muonTable->addColumn<float>("pt", muon_pt, "pt", 20); | ||
muonTable->addColumn<float>("eta", muon_eta, "eta", 20); | ||
muonTable->addColumn<float>("phi", muon_phi, "phi", 20); | ||
muonTable->addColumn<double>("isGlobal", muon_isGlobal, "isGlobal", 20); | ||
muonTable->addColumn<float>("ptrel", muon_ptrel, "pt relative to parent jet", 20); | ||
muonTable->addColumn<int>("nMuHit", muon_nMuHit, "number of muon hits", 20); | ||
muonTable->addColumn<int>("nMatched", muon_nMatched, "number of matched stations", 20); | ||
muonTable->addColumn<int>("nTkHit", muon_nTkHit, "number of tracker hits", 20); | ||
muonTable->addColumn<int>("nPixHit", muon_nPixHit, "number of pixel hits", 20); | ||
muonTable->addColumn<int>("nOutHit", muon_nOutHit, "number of missing outer hits", 20); | ||
muonTable->addColumn<double>("chi2", muon_chi2, "chi2", 20); | ||
muonTable->addColumn<double>("chi2Tk", muon_chi2Tk, "chi2 inner track", 20); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Implementation itself, technical details & storage]
The observables of type float or double have precision of 20, while the other variables in JetConstituentTableProducer all use 10. Was there a specific reason to effectively double the precision? Maybe we should stick to one level of precision for jet-constituent-related features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[For my own education, but maybe to avoid we might be missing something here]
How were the stored Muon variables chosen, related to e.g. https://github.com/cms-btv-pog/RecoBTag-PerformanceMeasurements/blob/12_2_X/interface/JetInfoBranches.h#L344-L369?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what concerns the precision, I tried with 10 but I obtained a segmentation violation error.
About the variables stored, I followed this list according to Soureek instructions https://github.com/cms-btv-pog/RecoBTag-PerformanceMeasurements/blob/10_6_X_UL2016_PreliminaryJECs/python/varGroups_cfi.py#L1738-L1890
python/addPFCands_cff.py
Outdated
#nameMu = cms.string("JetMuons"), | ||
#idx_nameMu = cms.string("MuIdx"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Style] Places with commented out code which could be removed for clarity
The default name JetMuons and Idx when adding the Muon table to AK4 jets could also be removed (currently commented out).
New variables for FatJets and SubJets are added in PFNano/python/addBTV.py
For FatJets, a MuonTable is defined in PFNano/plugins/JetConstituentTableProducer.cc, which contains the following variables:
The distributions of these variables, obtained by running PFNano/test/nano_mc_Run3_122X_NANO.py over 1k events of /ZprimeToTT_M3000_W30_TuneCP2_13p6TeV-madgraph-pythiaMLM-pythia8/Run3Winter22MiniAOD-122X_mcRun3_2021_realistic_v9-v2/MINIAODSIM, are shown at this link.
This work was carried out under the supervision of Soureek Mitra and Denise Muller.