Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
alefisico committed Jan 3, 2024
1 parent 243027b commit 5c6ec7c
Showing 1 changed file with 162 additions and 2 deletions.
164 changes: 162 additions & 2 deletions _episodes/01-jets101.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,13 @@ Full set of intro slides: Slides 1-24 (FIXME)
From a list of particles one can form jets, an object to reconstruct the shower of particles produced from a quark or gluon. Each particle belonging to a jet is known as a constituent. Each has a 4-vector that can be used for further studies.
We need a jet algorithm to collect the particles in a shower. This defines a __Clustering
Algorithm__. This algorithm:
Algorithm__. A good jet algorithm is _infrared and collinear safe_. The set of hard jets should be unchanged by soft emission and collinear splitting.
### Jet Clustering Algorithms
Most jet algorithms at hadron colliders use a so-called "clustering sequence". This is essentially a pairwise examination of the input four vectors. If the pair satisfy some criteria, they are merged. The process is repeated until the entire list of constituents is exhausted.
These algorithms follow this recipe:
* iteratively find the two particles in the event which are closest in some distance measure and combine them.
* Combine two particles if $d_{ij} < d_{iB}$, where $d_{ij} = \min(p^{2p}_{ti}, p^{2p}_{tj}) \Delta R^{2}_{ij} / R^2 $ and $d_{iB} = p_{ti}^{2p}$.
* if $p=1$ then _kt algorithm_ (KT)
Expand All @@ -77,9 +83,163 @@ Algorithm__. This algorithm:
<center><figcaption>A more visual way of think about the recluster algorithm.</figcaption></center>
</figure>
> ## How the different jet algorithms look like in our events?
> <img src="../fig/JHEP04_2008_063.jpg" alt="" style="width: 600px;"/>
> Comparison of jet areas for four different jet algorithms, from "The anti-kt Clustering Algorithm" by Cacciari, Salam, and Soyez [JHEP04, 063 (2008), arXiv:0802.1189].
{: .callout}
Some excellent references about jet algorithms can be found here:
- [Toward Jetography](http://arxiv.org/abs/0906.1833) by Gavin Salam.
- [Jets in Hadron-Hadron Collisions](http://arxiv.org/abs/0712.2447) by Ellis, Huston, Hatakeyama, Loch, and Toennesmann
- [The Catchment Area of Jets](http://arxiv.org/abs/0802.1188) by Cacciari, Salam, and Soyez.
- [The anti-kt Clustering Algorithm](http://arxiv.org/abs/0802.1189) by Cacciari, Salam, and Soyez.
In addition, several ways exist to determine the "area" of the jet over which the input constituents lay. This is very important in correcting pileup, as we will see, because some algorithms tend to "consume" more constituents than others and hence are more susceptible to pileup. Furthermore, the amount of energy inside a jet due to pileup is proportional to the area, so it is essential to know the jet area to correct this effect.
### Exercise 1.1
> ## Open a notebook
>
> For this part, open the notebook called `Jets_101.ipynb` (if it is not opened) and run Exercise 3.
{: .checklist}
> ## Discussion 1.3
>
> Before you run the _Comparing jet areas between AK4 and AK8_ part of the notebook, what type of
> distribution do you expect for the areas of the AK4 and AK8 jets?
{: .discussion}
> ## Question 1.2
>
> After running the _Comparing jet areas between AK4 and AK8_ part of the notebook: Try modifying the plotting cell to add vertical lines at area values corresponding to $\pi R^2$. Do the histogram peaks line up with these values?
{: .challenge}
> ## Solution 1.2
> Add these lines in the plotting cell:
> ```
> plt.axvline(x=np.pi*(0.4*0.4), color='b', linestyle='--')
> plt.axvline(x=np.pi*(0.8*0.8), color='r', linestyle='--')
> ```
{: .solution}
## Jet Types in hadron colliders
The jet algorithms take as input a set of 4-vectors. At CMS, the most popular jet type is the "Particle Flow Jet," which attempts to use the entire detector at once and derive single four vectors representing specific particles. For this reason, it is very comparable (ideally) to clustering generator-level four-vectors.
### Particle Flow Jets (PFJets)
Particle Flow candidates (PFCandidates) combine information from various detectors to estimate particle properties based on their assigned identities (photon, electron, muon, charged hadron, neutral hadron).
PFJets are created by clustering PFCandidates into jets and contain information about contributions of every particle class: Electromagnetic/hadronic, Charged/neutral, etc.
The jet response is high. The jet pT resolution is good, starting at 15--20% at low pT and asymptotically reaching 5% at high pT.
### Monte Carlo Generator-level Jets (GenJets)
GenJets are pure Monte Carlo simulated jets. They are helpful for analysis with MC samples. GenJets are formed by clustering the four-momenta of Monte Carlo truth particles. This may include “invisible” particles (muons, neutrinos, WIMPs, etc.).
As no detector effects are involved, the jet response (or jet energy scale) is 1, and the jet resolution is perfect, by definition.
GenJets include information about the 4-vectors of constituent particles, the energy's hadronic and electromagnetic components, etc.
### Calorimeter Jets (CaloJets)
CaloJets are formed from energy deposits in the calorimeters (hadronic and electromagnetic), with no tracking information considered. In the barrel region, a calorimeter tower consists of a single HCAL cell and the associated 5x5 array of ECAL crystals (the HCAL-ECAL association is similar but more complicated in the endcap region). The four-momentum of a tower is assigned from the energy of the tower, assuming zero mass, with the direction corresponding to the tower position from the interaction point.
In CMS, CaloJets are used less often than PFJets. Their use includes performance studies to disentangle tracker and calorimeter effects and trigger-level analyses where the tracker is neglected to reduce the event processing time. ATLAS makes much more use of CaloJets, as their version of particle flow is less mature than CMS's.
### Exercise 1.2
> ## Open a notebook
>
> For this part, open the notebook called `Jets_101.ipynb` (if it is not opened) and run Exercise 2.
{: .checklist}
> ## Question 1.1
>
> After running the notebook's Exercise 2. As you can see, the agreement between Calo, Gen, and Pfjet could be better! Can you guess why?
{: .challenge}
> ## Solution 1.1
> We need to apply the jet energy corrections (JEC) described in the next exercise. But before doing that, we'll review the jet clustering algorithms used in CMS.
{: .solution}
## Jet types at the LHC
Jets are reconstructed physics objects representing the hadronization and fragmentation of quarks and gluons. CMS primarily uses anti-$k_{\mathrm{T}}$ jets with a cone-size of $R=0.4$ to reconstruct this jet type. We have algorithms that distinguish heavy-flavour (b or c) quarks (which are in the domain of the BTV POG), quark- vs gluon-originated jets, and jets from the main $pp$ collision versus jets formed primarily from pileup particles.
However, quarks and gluons are only part of the story! At the LHC, the typical collision energy is much greater than the mass scale of the known SM particles, and hence, even heavier particles like top quarks, W/Z/Higgs bosons, and heavy beyond-the-Standard-Model particles can be produced with large Lorentz boosts. When these particles decay to quarks and gluons, their decay products are collimated and overlap in the detector, making them difficult to reconstruct as individual AK4 jets.
Therefore, LHC analyses use jet algorithms with a large radius parameter to reconstruct these objects, called "large radius" or "fat" jets. CMS uses anti-$k_{\mathrm{T}}$ jets with $R=0.8$ (AK8) as the standard large-radius jet, while ATLAS uses AK10.
You can also read these excellent overviews of jet substructure techniques:
- [Boosted objects: a probe of beyond the Standard Model physics](http://arxiv.org/abs/1012.5412) by Abdesselam et al.
- [Looking inside jets: an introduction to jet substructure and boosted-object phenomenology](https://arxiv.org/abs/1901.10342) by Marzani, Soyez, and Spannowsky.
## Jet types and algorithms in CMS
The standard jet algorithms are all implemented in the CMS reconstruction software, [CMSSW](github.com/cms-sw/cmssw). However, a few algorithms with specific parameters (namely AK4, AK8, and CA15) have become standard tools in CMS; these jet types are extensively studied by the JetMET POG, and are highly recommended. These algorithms are included in the centrally produced CMS samples, at the AOD, miniAOD, and nanoAOD data tiers (note that miniAOD and nanoAOD are most commonly used for analysis, while AOD is much less common these days, and is not widely available on the grid). Other algorithms can be implemented and tested using the **JetToolbox** (more in the [following link](https://twiki.cern.ch/twiki/bin/viewauth/CMS/JetToolbox)).
In this part of the tutorial, you will learn how to access the jet collection included in the CMS datasets, compare the different jet types, and create your own collections.
### AOD
[This twiki](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideDataFormatRecoJets) summarizes the respective labels by which each jet collection can be retrieved from the event record for general AOD files. This format is currently used for specialized studies, but you can use the other formats for most analyses.
### MiniAOD
Three main jet collections are stored in the MiniAOD format, as described [here](https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookMiniAOD2017#Jets).
* **slimmedJets**: are AK4 energy-corrected jets using charged hadron subtraction (CHS) as the pileup removal algorithm. This is the default jet collection for CMS analyses for Run II. In this collection, you can find the following jet algorithms, as well as other jet-related quantities:
* b-tagging
* Pileup jet ID
* Quark/gluon likelihood info embedded.
* **slimmedJetsPUPPI**: are AK4 energy-corrected jets using the PUPPI algorithm for pileup removal. This collection will be the default for Run III analyses.
* **slimmedJetsAK8**: ak4 AK8 energy-corrected jets using the PUPPI algorithm for pileup removal. This has been the default collection for boosted jets in Run II. In this collection, you can find the following jet algorithms, as well as other jet-related quantities:
* Softdrop mass
* n-subjettiness and energy correlation variables
* Access to softdrop subjets
* Access to the associated AK8 CHS jet four-momentum, including soft drop and pruned mass, and n-subjectness.
> ## Examples of how to access jet collections in miniAOD samples
> Below are two examples of how to access jet collections from these samples. This exercise does not intend for you to modify code in order to access these collections, but rather for you to look at the code and get an idea about how you could access this information if needed.
>
> ### In C++
> Please take a look at the file [`jmedas_miniAODAnalyzer.C`](https://github.com/cms-jet/JMEDAS/blob/DASJan2023/src/jmedas_miniAODAnalyzer.C) with your favourite code viewer.
> You can run this code by using the python config file [`jmedas_miniAODtest.py`](https://github.com/cms-jet/JMEDAS/blob/DASJan2023/scripts/jmedas_miniAODtest.py) from your terminal once you have set a CMSSW environment and download this JMEDAS package. This script will only print out some information about the jets in that sample. Again, the most important part of this exercise is to get familiar with how to access jet collections from miniAOD. Take a good look at the prints this script produces to your terminal.
> ~~~
> cmsRun $CMSSW_BASE/src/Analysis/JMEDAS/scripts/jmedas_miniAODtest.py
> ~~~
> {: .language-bash}
>
> ### In Python
>
> Now take a look at the file [`jmedas_miniAODtest_purePython.py`](https://github.com/cms-jet/JMEDAS/blob/DASJan2023/scripts/jmedas_miniAODtest_purePython.py).
> This code can be run with simple python in your terminal. Similar as in the case for C++, the output of this job is some information about jets. The most important part of the exercise is to get familiar with how to access jet collections using python from miniAOD.
> ~~~
> python $CMSSW_BASE/src/Analysis/JMEDAS/scripts/jmedas_miniAODtest_purePython.py
> ~~~
> {: .language-bash}
>
{: .solution}
### NanoAOD
In nanoAOD, only AK4 CHS jets ( _Jet_ ) and AK8 PUPPI jets ( _FatJet_ ) are stored in Run 2. For Run 3, AK4 and AK8 jets are PUPPI jets. The jets in nanoAOD are similar to those in miniAOD, but not identical (for example, the $p_{\mathrm{T}}$ cuts might be different). A full set of variables for each jet collection can be found in this [website](https://cms-nanoaod-integration.web.cern.ch/autoDoc/NanoAODv9/2018UL/doc_TTToSemiLeptonic_TuneCP5_13TeV-powheg-pythia8_RunIISummer20UL18NanoAODv9-106X_upgrade2018_realistic_v16_L1v1-v1.html).
NanoAOD is a "flat tree" format, meaning you can access the information directly with simple ROOT or even simple Python tools (like numpy or pandas). This format is recommended for analyses in CMS, unless one needs to access other variables not stored in nanoAOD. _This tutorial will only use nanoAOD files._
> ## Note
> There are several advanced tools on the market which allow you to do sophisticated analysis using nanoAOD format, including [RDataFrame](https://root.cern/doc/master/classROOT_1_1RDataFrame.html), [NanoAOD-tools](https://github.com/cms-nanoAOD/nanoAOD-tools), or [Coffea](https://github.com/CoffeaTeam/coffea). We encourage you to look at them and use the one you like the most. However, we are going to use coffea for this tutorial.
{: .callout}
## Jet properties
### Exercise 1
### Exercise 1.3
> ## Open a notebook
> This preliminary exercise will illustrate some of the basic properties of jets, like the four-momentum quantities: pt, eta, phi, and mass. We will use nanoAOD files currently widely used with the CMS Collaborators. For more information about nanoAOD follow [this link](https://gitlab.cern.ch/cms-nanoAOD/nanoaod-doc/-/wikis/home). At the end of the notebook, you will be able to see all the quantities stored in the `Jet` collection.
Expand Down

0 comments on commit 5c6ec7c

Please sign in to comment.