12 4 8 new vars v1 #47

angzaza · 2022-12-14T17:48:41Z

New variables for FatJets and SubJets are added in PFNano/python/addBTV.py

uncorrpt
residual
jes
CombIVF
DeepCSVBDisc
btagDeepB_c
DeepCSVCvsLDisc
DeepCSVCvsBDisc

For FatJets, a MuonTable is defined in PFNano/plugins/JetConstituentTableProducer.cc, which contains the following variables:

pt
eta
ptrel
phi
isGlobal
nMuHit
nMatched
nTkHit
nPixHit
nOutHit

The distributions of these variables, obtained by running PFNano/test/nano_mc_Run3_122X_NANO.py over 1k events of /ZprimeToTT_M3000_W30_TuneCP2_13p6TeV-madgraph-pythiaMLM-pythia8/Run3Winter22MiniAOD-122X_mcRun3_2021_realistic_v9-v2/MINIAODSIM, are shown at this link.

This work was carried out under the supervision of Soureek Mitra and Denise Muller.

demuller · 2022-12-16T11:49:51Z

Hi @angzaza many thanks for your PR! I will have a look later today at it.

@AnnikaStein could you review it as well?

demuller · 2022-12-16T11:51:57Z

Maybe one first thing I directly noticed: the commit history seems to contain a lot of older changes not made by you, can you try to make a cleaner PR that only contains your own changes? It is otherwise a bit hard to review

demuller · 2022-12-16T11:53:12Z

Maybe one first thing I directly noticed: the commit history seems to contain a lot of older changes not made by you, can you try to make a cleaner PR that only contains your own changes? It is otherwise a bit hard to review

it probably would also help with the branch conflicts

AnnikaStein · 2022-12-16T13:48:25Z

Hi @angzaza many thanks for your PR! I will have a look later today at it.

@AnnikaStein could you review it as well?

Sure, I can do that!

[General comment]
What I see is that this PR which is meant for 12_4_8 has been prepared for the master branch, hence the conflicts. If this was updated, I guess it would already look a lot cleaner. I'll use the edit button to switch that, and then we can look at the details.

AnnikaStein · 2022-12-16T13:54:38Z

Maybe one first thing I directly noticed: the commit history seems to contain a lot of older changes not made by you, can you try to make a cleaner PR that only contains your own changes? It is otherwise a bit hard to review

it probably would also help with the branch conflicts

Indeed after switching the target branch, neither conflicts nor commits from other people are left in the PR :)

So thanks also from my side @angzaza ! I'll take a closer look and get back to you with specific comments later.

AnnikaStein

Thank you for this nice work @angzaza !
 Although more detailed statements and examples are directly marked in the code, here is a summary:

[Style]
Mostly indentation & commented out code, especially for indentation I only commented on the first occurrence although there were many:

For plugins/JetConstituentTableProducer.cc in L51,54,67,82,85,92,97,110,116-118,125,… 243ff., 326ff.,353,356,363f.
python/addBTV.py L454
python/addPFcands_cff.py L65,76,129,139

[Implementation itself, technical details & storage]

Some comments on more robust calculation of discriminators, similar to what is done in NanoAOD, directly marked inline.
Synchronize precision of floating point branches (10 so far for everything versus 20 for Muon)
Cross-check variables w.r.t. RecoBTag-PerformanceMeasurements (where there seem to be more)

[Suggestion]

Checked out this development branch and ran a quick test myself: currently, the Muon information is added for all Jet collections, FatJets included which is good, but I’m not sure we need this also for AK4, AK4GenJets, AK8GenJets? I’d suggest we add a boolean flag per jet collection which is read by the producer. Like so: addMuonTable = cms.bool(False) for those where we will definitely not add them, but addMuonTable = cms.bool(True) for FatJets, needs some if{} clauses inside the producer then and could be handled similar to readBtag. Whether or not it should be added for more than just FatJets is probably something for discussion, can other commissioning workflows besides the one for fat jets make use of (Gen)JetMuon tables? Storage-wise it doesn’t look too heavy to include that also for the other collections as it’s not hundreds of features and half of them are integers.

AnnikaStein · 2022-12-16T23:12:45Z

plugins/JetConstituentTableProducer.cc

@@ -48,19 +48,23 @@ class JetConstituentTableProducer : public edm::stream::EDProducer<> {
  //const std::string name_;
  const std::string name_;
  const std::string nameSV_;
+	const std::string nameMu_;


[Style] Indentation level seems to vary for the newly added lines, compared to the surrounding statements

AnnikaStein · 2022-12-16T23:13:48Z

plugins/JetConstituentTableProducer.cc

+	/*outMuons->clear();
+  jetIdx_mu.clear();
+  muIdx.clear();
+  muon_pt.clear();
+  muon_eta.clear();
+  muon_phi.clear();
+  muon_ptrel.clear();*/


[Style] Places with commented out code which could be removed for clarity

AnnikaStein · 2022-12-16T23:14:52Z

python/addBTV.py

@@ -363,6 +449,7 @@ def add_BTV(process, runOnMC=False, onlyAK4=False, onlyAK8=False, keepInputs=['D
        extension=cms.bool(True),  # this is the extension table for FatJets
        variables=cms.PSet(
            CommonVars,
+						SubJetVars,


[Style] Indentation level seems to vary for the newly added lines, compared to the surrounding statements

AnnikaStein · 2022-12-16T23:15:04Z

python/addPFCands_cff.py

@@ -62,6 +62,8 @@ def addPFCands(process, runOnMC=False, allPF = False, onlyAK4=False, onlyAK8=Fal
                                                        idx_name = cms.string("pFCandsIdx"),
                                                        nameSV = cms.string("FatJetSVs"),
                                                        idx_nameSV = cms.string("sVIdx"),
+																												nameMu = cms.string("FatJetMuons"),


[Style] Indentation level seems to vary for the newly added lines, compared to the surrounding statements
Especially here, this looked like an empty line and caused some confusion first

AnnikaStein · 2022-12-16T23:16:36Z

python/addBTV.py

+        DeepCSVCvsLDisc=Var("? (bDiscriminator('pfDeepCSVJetTags:probc'))!=-1 ? bDiscriminator('pfDeepCSVJetTags:probc')/(bDiscriminator('pfDeepCSVJetTags:probc')+bDiscriminator('pfDeepCSVJetTags:probudsg')) : -1",
+                     float,
+                     doc="DeepCSV C vs L discriminator",
+                     precision=10),
+        DeepCSVCvsBDisc=Var("? (bDiscriminator('pfDeepCSVJetTags:probc'))!=-1 ? bDiscriminator('pfDeepCSVJetTags:probc')/(bDiscriminator('pfDeepCSVJetTags:probc')+bDiscriminator('pfDeepCSVJetTags:probb')+bDiscriminator('pfDeepCSVJetTags:probbb')) : -1",
+                     float,
+                     doc="DeepCSV C vs B discriminator",
+                     precision=10),


[Implementation itself, technical details & storage]
We might want to synchronize this check for invalid values with the statement in NanoAOD, example here: https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/NanoAOD/python/jetsAK4_Puppi_cff.py#L86-L87 —> switch from !=-1 to >=0. Although that should not make that big of a difference, probably it has some effects related to floating point precision.

AnnikaStein · 2022-12-16T23:16:55Z

python/addBTV.py

+        DeepCSVCvsLDisc=Var("? (bDiscriminator('pfDeepCSVJetTags:probc'))!=-1 ? bDiscriminator('pfDeepCSVJetTags:probc')/(bDiscriminator('pfDeepCSVJetTags:probc')+bDiscriminator('pfDeepCSVJetTags:probudsg')) : -1",
+            float,
+            doc="DeepCSV C vs L discriminator",
+            precision=10),
+        DeepCSVCvsBDisc=Var("? (bDiscriminator('pfDeepCSVJetTags:probc'))!=-1 ? bDiscriminator('pfDeepCSVJetTags:probc')/(bDiscriminator('pfDeepCSVJetTags:probc')+bDiscriminator('pfDeepCSVJetTags:probb')+bDiscriminator('pfDeepCSVJetTags:probbb')) : -1",
+            float,
+            doc="DeepCSV C vs B discriminator",
+            precision=10),


[Implementation itself, technical details & storage]
We might want to synchronize this check for invalid values with the statement in NanoAOD, example here: https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/NanoAOD/python/jetsAK4_Puppi_cff.py#L86-L87 —> switch from !=-1 to >=0. Although that should not make that big of a difference, probably it has some effects related to floating point precision.

AnnikaStein · 2022-12-16T23:18:35Z

python/addBTV.py

+                float,
+                doc="combinedIVF",
+                precision=10),
+        DeepCSVBDisc=Var("bDiscriminator('pfDeepCSVJetTags:probb')+bDiscriminator('pfDeepCSVJetTags:probbb')",


[Implementation itself, technical details & storage]
For the b discriminator (the sum of the individual b, bb related outputs) something similar to the CvsL/CvsB discriminator approach could be done to avoid getting irregular -2 bins: https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/NanoAOD/python/jetsAK4_Puppi_cff.py#L83 -> also add the ternary operator to test whether the sum is >=0 and assign either that sum or -1 finally.

AnnikaStein · 2022-12-16T23:18:48Z

python/addBTV.py

+                     float,
+                     doc="combinedIVF",
+                     precision=10),
+        DeepCSVBDisc=Var("bDiscriminator('pfDeepCSVJetTags:probb')+bDiscriminator('pfDeepCSVJetTags:probbb')",


[Implementation itself, technical details & storage]
Same comment as above

AnnikaStein · 2022-12-16T23:22:22Z

plugins/JetConstituentTableProducer.cc

+	auto muonTable = std::make_unique<nanoaod::FlatTable>(outMuons->size(), nameMu_, false);
+	// We fill from here only stuff that cannot be created with the SimpleFlatTnameableProducer
+	muonTable->addColumn<int>("jetIdx", jetIdx_mu, "Index of the parent jet");
+	muonTable->addColumn<int>(idx_nameMu_, muIdx, "Index in the Muon list");
+	if (readBtag_) {
+		muonTable->addColumn<float>("pt", muon_pt, "pt", 20);
+	 	muonTable->addColumn<float>("eta", muon_eta, "eta", 20);
+	 	muonTable->addColumn<float>("phi", muon_phi, "phi", 20);
+	  muonTable->addColumn<double>("isGlobal", muon_isGlobal, "isGlobal", 20);
+	 	muonTable->addColumn<float>("ptrel", muon_ptrel, "pt relative to parent jet", 20);
+		muonTable->addColumn<int>("nMuHit", muon_nMuHit, "number of muon hits", 20);
+    muonTable->addColumn<int>("nMatched", muon_nMatched, "number of matched stations", 20);
+    muonTable->addColumn<int>("nTkHit", muon_nTkHit, "number of tracker hits", 20);
+    muonTable->addColumn<int>("nPixHit", muon_nPixHit, "number of pixel hits", 20);
+    muonTable->addColumn<int>("nOutHit", muon_nOutHit, "number of missing outer hits", 20);
+    muonTable->addColumn<double>("chi2", muon_chi2, "chi2", 20);
+    muonTable->addColumn<double>("chi2Tk", muon_chi2Tk, "chi2 inner track", 20);


[Implementation itself, technical details & storage]
The observables of type float or double have precision of 20, while the other variables in JetConstituentTableProducer all use 10. Was there a specific reason to effectively double the precision? Maybe we should stick to one level of precision for jet-constituent-related features.

[For my own education, but maybe to avoid we might be missing something here]
How were the stored Muon variables chosen, related to e.g. https://github.com/cms-btv-pog/RecoBTag-PerformanceMeasurements/blob/12_2_X/interface/JetInfoBranches.h#L344-L369?

For what concerns the precision, I tried with 10 but I obtained a segmentation violation error.
About the variables stored, I followed this list according to Soureek instructions https://github.com/cms-btv-pog/RecoBTag-PerformanceMeasurements/blob/10_6_X_UL2016_PreliminaryJECs/python/varGroups_cfi.py#L1738-L1890

AnnikaStein · 2022-12-16T23:28:19Z

python/addPFCands_cff.py

+																												#nameMu = cms.string("JetMuons"),
+                                                        #idx_nameMu = cms.string("MuIdx"),


[Style] Places with commented out code which could be removed for clarity
The default name JetMuons and Idx when adding the Muon table to AK4 jets could also be removed (currently commented out).

AnnikaStein · 2023-01-05T12:22:13Z

Hi,
Happy New Year to you! Just wanted to check, @angzaza if you had any questions regarding the review comments, or how to implement them. @demuller was there something else that I missed so far which should be included in the merge or were there further comments?
Thanks! 🙂

angzaza added 5 commits November 8, 2022 10:50

Added muon variables related to jets

9463bda

Added muon variables related to jets

c7d4074

add variables for track table

a35ee2a

added variables for FatJet and SubJet

670c2d2

New vars for FatJets and SubJets

002caef

AnnikaStein changed the base branch from master to 12_4_8 December 16, 2022 13:48

AnnikaStein self-requested a review December 16, 2022 13:52

AnnikaStein suggested changes Dec 16, 2022

View reviewed changes

angzaza added 6 commits January 12, 2023 13:30

adjust indentation

d42065e

adjust indentation

58c9d94

adjust indentation

9834793

solve segmentation violation error

1a8dac5

improve variable implementation

469e857

save muon table only for fat jets

b341310

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

12 4 8 new vars v1 #47

12 4 8 new vars v1 #47

angzaza commented Dec 14, 2022

demuller commented Dec 16, 2022

demuller commented Dec 16, 2022

demuller commented Dec 16, 2022

AnnikaStein commented Dec 16, 2022

AnnikaStein commented Dec 16, 2022

AnnikaStein left a comment

AnnikaStein Dec 16, 2022

AnnikaStein Dec 16, 2022

AnnikaStein Dec 16, 2022

AnnikaStein Dec 16, 2022

AnnikaStein Dec 16, 2022

AnnikaStein Dec 16, 2022

AnnikaStein Dec 16, 2022

AnnikaStein Dec 16, 2022

AnnikaStein Dec 16, 2022

AnnikaStein Dec 16, 2022

angzaza Jan 13, 2023

AnnikaStein Dec 16, 2022

AnnikaStein commented Jan 5, 2023

		#nameMu = cms.string("JetMuons"),
		#idx_nameMu = cms.string("MuIdx"),

12 4 8 new vars v1 #47

Are you sure you want to change the base?

12 4 8 new vars v1 #47

Conversation

angzaza commented Dec 14, 2022

demuller commented Dec 16, 2022

demuller commented Dec 16, 2022

demuller commented Dec 16, 2022

AnnikaStein commented Dec 16, 2022

AnnikaStein commented Dec 16, 2022

AnnikaStein left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AnnikaStein commented Jan 5, 2023