From dd1c51abda871a9b17236a62da98a61354264132 Mon Sep 17 00:00:00 2001 From: Alejandro Gomez Espinosa Date: Wed, 3 Jan 2024 21:28:11 -0500 Subject: [PATCH] update --- _episodes/01-jets101.md | 90 ++++++++++++++++++++++++++--------------- 1 file changed, 57 insertions(+), 33 deletions(-) diff --git a/_episodes/01-jets101.md b/_episodes/01-jets101.md index 4825588e..14c5b9dd 100644 --- a/_episodes/01-jets101.md +++ b/_episodes/01-jets101.md @@ -72,7 +72,7 @@ Most jet algorithms at hadron colliders use a so-called "clustering sequence". T These algorithms follow this recipe: * iteratively find the two particles in the event which are closest in some distance measure and combine them. - * Defining$d_{ij} = min(p^{2p}_{ti}, p^{2p}_{tj}) \Delta R^{2}_{ij} / R^2 $ and $d_{iB} = p_{ti}^{2p}$. We combine two particles if $d_{ij} < d_{iB}$. + * Defining $d_{ij}=min(p^{2p}_{ti},p^{2p}_{tj})\DeltaR^{2}_{ij}/R^2$ and $d_{iB} = p_{ti}^{2p}$. We combine two particles if $d_{ij} < d_{iB}$. * if $p=1$ then _kt algorithm_ (KT) * if $p=0$ then _Cambridge Aachen algorithm_ (CA) * if $p=-1$ then _kt algorithm_ (KT) @@ -103,28 +103,43 @@ Some excellent references about jet algorithms can be found here: > website [www.fastjet.fr](www.fastjet.fr) in your free time. {: .keypoints} -In addition, several ways exist to determine the "area" of the jet over which the input constituents lay. This is very important in correcting pileup, as we will see, because some algorithms tend to "consume" more constituents than others and hence are more susceptible to pileup. Furthermore, the amount of energy inside a jet due to pileup is proportional to the area, so it is essential to know the jet area to correct this effect. + +### Jet types at the LHC + +Jets are reconstructed physics objects representing the hadronization and fragmentation of quarks and gluons. CMS primarily uses anti-$k_{\mathrm{T}}$ jets with a cone-size of $R=0.4$ to reconstruct this jet type. We have algorithms that distinguish heavy-flavour (b or c) quarks (which are in the domain of the BTV POG), quark- vs gluon-originated jets, and jets from the main $pp$ collision versus jets formed primarily from pileup particles. + +However, quarks and gluons are only part of the story! At the LHC, the typical collision energy is much greater than the mass scale of the known SM particles, and hence, even heavier particles like top quarks, W/Z/Higgs bosons, and heavy beyond-the-Standard-Model particles can be produced with large Lorentz boosts. When these particles decay to quarks and gluons, their decay products are collimated and overlap in the detector, making them difficult to reconstruct as individual AK4 jets. + +Therefore, LHC analyses use jet algorithms with a large radius parameter to reconstruct these objects, called "large radius" or "fat" jets. CMS uses anti-$k_{\mathrm{T}}$ jets with $R=0.8$ (AK8) as the standard large-radius jet, while ATLAS uses AK10. + +You can also read these excellent overviews of jet substructure techniques: + +- [Boosted objects: a probe of beyond the Standard Model physics](http://arxiv.org/abs/1012.5412) by Abdesselam et al. +- [Looking inside jets: an introduction to jet substructure and boosted-object phenomenology](https://arxiv.org/abs/1901.10342) by Marzani, Soyez, and Spannowsky. ### Exercise 1.1 > ## Open a notebook +> Several ways exist to determine the "area" of the jet over which the input constituents lay. This is very important in correcting pileup, as we will see, because some algorithms tend to "consume" more constituents than others and hence are more susceptible to pileup. Furthermore, the amount of energy inside a jet due to pileup is proportional to the area, so it is essential to know the jet area to correct this effect. +> +> In the first exercise we will compare jet areas for different types of jets. > -> For this part, open the notebook called `Jets_101.ipynb` (if it is not opened) and run Exercise 3. +> For this part, open the notebook called `Jets_101.ipynb` (if it is not opened) and run Exercise 1.1 {: .checklist} -> ## Discussion 1.3 +> ## Discussion 1.1 > > Before you run the _Comparing jet areas between AK4 and AK8_ part of the notebook, what type of > distribution do you expect for the areas of the AK4 and AK8 jets? {: .discussion} -> ## Question 1.2 +> ## Question 1.1 > > After running the _Comparing jet areas between AK4 and AK8_ part of the notebook: Try modifying the plotting cell to add vertical lines at area values corresponding to $\pi R^2$. Do the histogram peaks line up with these values? {: .challenge} -> ## Solution 1.2 +> ## Solution 1.1 > Add these lines in the plotting cell: > ``` > plt.axvline(x=np.pi*(0.4*0.4), color='b', linestyle='--') @@ -171,30 +186,18 @@ In CMS we recluster two types of PFJets: > ## Open a notebook > -> For this part, open the notebook called `Jets_101.ipynb` (if it is not opened) and run Exercise 2. +> For this part, open the notebook called `Jets_101.ipynb` (if it is not opened) and run Exercise 1.2 {: .checklist} -> ## Question 1.1 +> ## Question 1.2 > > After running the notebook's Exercise 2. As you can see, the agreement between Calo, Gen, and Pfjet could be better! Can you guess why? {: .challenge} -> ## Solution 1.1 +> ## Solution 1.2 > We need to apply the jet energy corrections (JEC) described in the next exercise. But before doing that, we'll review the jet clustering algorithms used in CMS. {: .solution} -### Jet types at the LHC - -Jets are reconstructed physics objects representing the hadronization and fragmentation of quarks and gluons. CMS primarily uses anti-$k_{\mathrm{T}}$ jets with a cone-size of $R=0.4$ to reconstruct this jet type. We have algorithms that distinguish heavy-flavour (b or c) quarks (which are in the domain of the BTV POG), quark- vs gluon-originated jets, and jets from the main $pp$ collision versus jets formed primarily from pileup particles. - -However, quarks and gluons are only part of the story! At the LHC, the typical collision energy is much greater than the mass scale of the known SM particles, and hence, even heavier particles like top quarks, W/Z/Higgs bosons, and heavy beyond-the-Standard-Model particles can be produced with large Lorentz boosts. When these particles decay to quarks and gluons, their decay products are collimated and overlap in the detector, making them difficult to reconstruct as individual AK4 jets. - -Therefore, LHC analyses use jet algorithms with a large radius parameter to reconstruct these objects, called "large radius" or "fat" jets. CMS uses anti-$k_{\mathrm{T}}$ jets with $R=0.8$ (AK8) as the standard large-radius jet, while ATLAS uses AK10. - -You can also read these excellent overviews of jet substructure techniques: - -- [Boosted objects: a probe of beyond the Standard Model physics](http://arxiv.org/abs/1012.5412) by Abdesselam et al. -- [Looking inside jets: an introduction to jet substructure and boosted-object phenomenology](https://arxiv.org/abs/1901.10342) by Marzani, Soyez, and Spannowsky. ### Jet types and algorithms in CMS @@ -202,24 +205,24 @@ The standard jet algorithms are all implemented in the CMS reconstruction softwa In this part of the tutorial, you will learn how to access the jet collection included in the CMS datasets, compare the different jet types, and create your own collections. -### AOD +#### AOD [This twiki](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideDataFormatRecoJets) summarizes the respective labels by which each jet collection can be retrieved from the event record for general AOD files. This format is currently used for specialized studies, but you can use the other formats for most analyses. -### MiniAOD +#### MiniAOD Three main jet collections are stored in the MiniAOD format, as described [here](https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookMiniAOD2017#Jets). - * **slimmedJets**: are AK4 energy-corrected jets using charged hadron subtraction (CHS) as the pileup removal algorithm. This is the default jet collection for CMS analyses for Run II. In this collection, you can find the following jet algorithms, as well as other jet-related quantities: + * **slimmedJets**: are AK4 energy-corrected jets using charged hadron subtraction (CHS) as the pileup removal algorithm. Jets are selected with $p_T >10$ GeV (typically analysis cut will be at least pT>20). This is the default jet collection for CMS analyses for Run II. In this collection, you can find the following jet algorithms, as well as other jet-related quantities: * b-tagging * Pileup jet ID * Quark/gluon likelihood info embedded. * **slimmedJetsPUPPI**: are AK4 energy-corrected jets using the PUPPI algorithm for pileup removal. This collection will be the default for Run III analyses. - * **slimmedJetsAK8**: ak4 AK8 energy-corrected jets using the PUPPI algorithm for pileup removal. This has been the default collection for boosted jets in Run II. In this collection, you can find the following jet algorithms, as well as other jet-related quantities: + * **slimmedJetsAK8**: ak4 AK8 energy-corrected jets using the PUPPI algorithm for pileup removal. Jets are selected iwth pT >170 GeV with all information, including PF candidate links(typically analysis cut will be at least pT>200). This has been the default collection for boosted jets in Run II. In this collection, you can find the following jet algorithms, as well as other jet-related quantities: * Softdrop mass * n-subjettiness and energy correlation variables - * Access to softdrop subjets + * Access to softdrop subjets with pT >30 GeV: minimal information for 3 leading jets. * Access to the associated AK8 CHS jet four-momentum, including soft drop and pruned mass, and n-subjectness. > ## Examples of how to access jet collections in miniAOD samples @@ -244,32 +247,53 @@ Three main jet collections are stored in the MiniAOD format, as described [here] > {: .solution} -### NanoAOD - -In nanoAOD, only AK4 CHS jets ( _Jet_ ) and AK8 PUPPI jets ( _FatJet_ ) are stored in Run 2. For Run 3, AK4 and AK8 jets are PUPPI jets. The jets in nanoAOD are similar to those in miniAOD, but not identical (for example, the $p_{\mathrm{T}}$ cuts might be different). A full set of variables for each jet collection can be found in this [website](https://cms-nanoaod-integration.web.cern.ch/autoDoc/NanoAODv9/2018UL/doc_TTToSemiLeptonic_TuneCP5_13TeV-powheg-pythia8_RunIISummer20UL18NanoAODv9-106X_upgrade2018_realistic_v16_L1v1-v1.html). +#### NanoAOD NanoAOD is a "flat tree" format, meaning you can access the information directly with simple ROOT or even simple Python tools (like numpy or pandas). This format is recommended for analyses in CMS, unless one needs to access other variables not stored in nanoAOD. _This tutorial will only use nanoAOD files._ +In nanoAOD, only AK4 CHS jets ( _Jet_ ) and AK8 PUPPI jets ( _FatJet_ ) are stored in Run 2. For Run 3, AK4 and AK8 jets are PUPPI jets. The jets in nanoAOD are similar to those in miniAOD, but not identical (for example, the $p_{\mathrm{T}}$ cuts might be different). In short: + + * Jet = ak4PFJetsCHS + * pT >15 GeV + * Similar to miniAOD content, but many more (up-to-date) quantities (e.g. JEC) + * FatJet = ak8PFJetsPUPPI + * Similar content to miniAOD, but many more (up-to-date) quantities such as DeepXXX taggers + +A full set of variables for each jet collection can be found in this [website](https://cms-nanoaod-integration.web.cern.ch/autoDoc/NanoAODv9/2018UL/doc_TTToSemiLeptonic_TuneCP5_13TeV-powheg-pythia8_RunIISummer20UL18NanoAODv9-106X_upgrade2018_realistic_v16_L1v1-v1.html). + +Also possible to customize nanoAOD. JME/BTV have their extended format with more jet collections and/or PF candidates. It is a common format for “automatised” workflows and ML training. + + > ## Note > There are several advanced tools on the market which allow you to do sophisticated analysis using nanoAOD format, including [RDataFrame](https://root.cern/doc/master/classROOT_1_1RDataFrame.html), [NanoAOD-tools](https://github.com/cms-nanoAOD/nanoAOD-tools), or [Coffea](https://github.com/CoffeaTeam/coffea). We encourage you to look at them and use the one you like the most. However, we are going to use coffea for this tutorial. {: .callout} -## Jet properties +### Jet properties + +A short list of jet properties that we can find in nanoAOD are: + * Jet 4-vector = sum of all constituent particle 4-vectors: energy, pT, η, Φ + * Jet mass + * Jet constituent multiplicities (PF) ex. charged multiplicity + * Jet constituent fractions, ex. charged hadron energy fraction + * Jet area = area in η-Φ plane in which an infinitely soft particle will be clustered with the jet + * Jet tagging information + * and many more + ### Exercise 1.3 > ## Open a notebook > This preliminary exercise will illustrate some of the basic properties of jets, like the four-momentum quantities: pt, eta, phi, and mass. We will use nanoAOD files currently widely used with the CMS Collaborators. For more information about nanoAOD follow [this link](https://gitlab.cern.ch/cms-nanoAOD/nanoaod-doc/-/wikis/home). At the end of the notebook, you will be able to see all the quantities stored in the `Jet` collection. > -> For this part, open the notebook called `Jets_101.ipynb` and run Exercise 1. +> For this part, open the notebook called `Jets_101.ipynb` and run Exercise 1.3 {: .checklist} -> ## Discussion 1.1 +> ## Discussion 1.2 > > Have you seen these jet quantities before? Were you expecting something different? {: .discussion} -> ## Discussion 1.2 +> ## Discussion 1.3 > > Did you plot other jet quantities stored in nanoAOD? Do you understand the meaning of them? {: .discussion}