Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
alefisico committed Jan 4, 2024
1 parent 6b16eeb commit dd1c51a
Showing 1 changed file with 57 additions and 33 deletions.
90 changes: 57 additions & 33 deletions _episodes/01-jets101.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ Most jet algorithms at hadron colliders use a so-called "clustering sequence". T
These algorithms follow this recipe:
* iteratively find the two particles in the event which are closest in some distance measure and combine them.
* Defining$d_{ij} = min(p^{2p}_{ti}, p^{2p}_{tj}) \Delta R^{2}_{ij} / R^2 $ and $d_{iB} = p_{ti}^{2p}$. We combine two particles if $d_{ij} < d_{iB}$.
* Defining $d_{ij}=min(p^{2p}_{ti},p^{2p}_{tj})\DeltaR^{2}_{ij}/R^2$ and $d_{iB} = p_{ti}^{2p}$. We combine two particles if $d_{ij} < d_{iB}$.
* if $p=1$ then _kt algorithm_ (KT)
* if $p=0$ then _Cambridge Aachen algorithm_ (CA)
* if $p=-1$ then _kt algorithm_ (KT)
Expand Down Expand Up @@ -103,28 +103,43 @@ Some excellent references about jet algorithms can be found here:
> website [www.fastjet.fr](www.fastjet.fr) in your free time.
{: .keypoints}
In addition, several ways exist to determine the "area" of the jet over which the input constituents lay. This is very important in correcting pileup, as we will see, because some algorithms tend to "consume" more constituents than others and hence are more susceptible to pileup. Furthermore, the amount of energy inside a jet due to pileup is proportional to the area, so it is essential to know the jet area to correct this effect.
### Jet types at the LHC
Jets are reconstructed physics objects representing the hadronization and fragmentation of quarks and gluons. CMS primarily uses anti-$k_{\mathrm{T}}$ jets with a cone-size of $R=0.4$ to reconstruct this jet type. We have algorithms that distinguish heavy-flavour (b or c) quarks (which are in the domain of the BTV POG), quark- vs gluon-originated jets, and jets from the main $pp$ collision versus jets formed primarily from pileup particles.
However, quarks and gluons are only part of the story! At the LHC, the typical collision energy is much greater than the mass scale of the known SM particles, and hence, even heavier particles like top quarks, W/Z/Higgs bosons, and heavy beyond-the-Standard-Model particles can be produced with large Lorentz boosts. When these particles decay to quarks and gluons, their decay products are collimated and overlap in the detector, making them difficult to reconstruct as individual AK4 jets.
Therefore, LHC analyses use jet algorithms with a large radius parameter to reconstruct these objects, called "large radius" or "fat" jets. CMS uses anti-$k_{\mathrm{T}}$ jets with $R=0.8$ (AK8) as the standard large-radius jet, while ATLAS uses AK10.
You can also read these excellent overviews of jet substructure techniques:
- [Boosted objects: a probe of beyond the Standard Model physics](http://arxiv.org/abs/1012.5412) by Abdesselam et al.
- [Looking inside jets: an introduction to jet substructure and boosted-object phenomenology](https://arxiv.org/abs/1901.10342) by Marzani, Soyez, and Spannowsky.
### Exercise 1.1
> ## Open a notebook
> Several ways exist to determine the "area" of the jet over which the input constituents lay. This is very important in correcting pileup, as we will see, because some algorithms tend to "consume" more constituents than others and hence are more susceptible to pileup. Furthermore, the amount of energy inside a jet due to pileup is proportional to the area, so it is essential to know the jet area to correct this effect.
>
> In the first exercise we will compare jet areas for different types of jets.
>
> For this part, open the notebook called `Jets_101.ipynb` (if it is not opened) and run Exercise 3.
> For this part, open the notebook called `Jets_101.ipynb` (if it is not opened) and run Exercise 1.1
{: .checklist}
> ## Discussion 1.3
> ## Discussion 1.1
>
> Before you run the _Comparing jet areas between AK4 and AK8_ part of the notebook, what type of
> distribution do you expect for the areas of the AK4 and AK8 jets?
{: .discussion}
> ## Question 1.2
> ## Question 1.1
>
> After running the _Comparing jet areas between AK4 and AK8_ part of the notebook: Try modifying the plotting cell to add vertical lines at area values corresponding to $\pi R^2$. Do the histogram peaks line up with these values?
{: .challenge}
> ## Solution 1.2
> ## Solution 1.1
> Add these lines in the plotting cell:
> ```
> plt.axvline(x=np.pi*(0.4*0.4), color='b', linestyle='--')
Expand Down Expand Up @@ -171,55 +186,43 @@ In CMS we recluster two types of PFJets:
> ## Open a notebook
>
> For this part, open the notebook called `Jets_101.ipynb` (if it is not opened) and run Exercise 2.
> For this part, open the notebook called `Jets_101.ipynb` (if it is not opened) and run Exercise 1.2
{: .checklist}
> ## Question 1.1
> ## Question 1.2
>
> After running the notebook's Exercise 2. As you can see, the agreement between Calo, Gen, and Pfjet could be better! Can you guess why?
{: .challenge}
> ## Solution 1.1
> ## Solution 1.2
> We need to apply the jet energy corrections (JEC) described in the next exercise. But before doing that, we'll review the jet clustering algorithms used in CMS.
{: .solution}
### Jet types at the LHC
Jets are reconstructed physics objects representing the hadronization and fragmentation of quarks and gluons. CMS primarily uses anti-$k_{\mathrm{T}}$ jets with a cone-size of $R=0.4$ to reconstruct this jet type. We have algorithms that distinguish heavy-flavour (b or c) quarks (which are in the domain of the BTV POG), quark- vs gluon-originated jets, and jets from the main $pp$ collision versus jets formed primarily from pileup particles.
However, quarks and gluons are only part of the story! At the LHC, the typical collision energy is much greater than the mass scale of the known SM particles, and hence, even heavier particles like top quarks, W/Z/Higgs bosons, and heavy beyond-the-Standard-Model particles can be produced with large Lorentz boosts. When these particles decay to quarks and gluons, their decay products are collimated and overlap in the detector, making them difficult to reconstruct as individual AK4 jets.
Therefore, LHC analyses use jet algorithms with a large radius parameter to reconstruct these objects, called "large radius" or "fat" jets. CMS uses anti-$k_{\mathrm{T}}$ jets with $R=0.8$ (AK8) as the standard large-radius jet, while ATLAS uses AK10.
You can also read these excellent overviews of jet substructure techniques:
- [Boosted objects: a probe of beyond the Standard Model physics](http://arxiv.org/abs/1012.5412) by Abdesselam et al.
- [Looking inside jets: an introduction to jet substructure and boosted-object phenomenology](https://arxiv.org/abs/1901.10342) by Marzani, Soyez, and Spannowsky.
### Jet types and algorithms in CMS
The standard jet algorithms are all implemented in the CMS reconstruction software, [CMSSW](github.com/cms-sw/cmssw). However, a few algorithms with specific parameters (namely AK4, AK8, and CA15) have become standard tools in CMS; these jet types are extensively studied by the JetMET POG, and are highly recommended. These algorithms are included in the centrally produced CMS samples, at the AOD, miniAOD, and nanoAOD data tiers (note that miniAOD and nanoAOD are most commonly used for analysis, while AOD is much less common these days, and is not widely available on the grid). Other algorithms can be implemented and tested using the **JetToolbox** (more in the [following link](https://twiki.cern.ch/twiki/bin/viewauth/CMS/JetToolbox)).
In this part of the tutorial, you will learn how to access the jet collection included in the CMS datasets, compare the different jet types, and create your own collections.
### AOD
#### AOD
[This twiki](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideDataFormatRecoJets) summarizes the respective labels by which each jet collection can be retrieved from the event record for general AOD files. This format is currently used for specialized studies, but you can use the other formats for most analyses.
### MiniAOD
#### MiniAOD
Three main jet collections are stored in the MiniAOD format, as described [here](https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookMiniAOD2017#Jets).
* **slimmedJets**: are AK4 energy-corrected jets using charged hadron subtraction (CHS) as the pileup removal algorithm. This is the default jet collection for CMS analyses for Run II. In this collection, you can find the following jet algorithms, as well as other jet-related quantities:
* **slimmedJets**: are AK4 energy-corrected jets using charged hadron subtraction (CHS) as the pileup removal algorithm. Jets are selected with $p_T >10$ GeV (typically analysis cut will be at least pT>20). This is the default jet collection for CMS analyses for Run II. In this collection, you can find the following jet algorithms, as well as other jet-related quantities:
* b-tagging
* Pileup jet ID
* Quark/gluon likelihood info embedded.
* **slimmedJetsPUPPI**: are AK4 energy-corrected jets using the PUPPI algorithm for pileup removal. This collection will be the default for Run III analyses.
* **slimmedJetsAK8**: ak4 AK8 energy-corrected jets using the PUPPI algorithm for pileup removal. This has been the default collection for boosted jets in Run II. In this collection, you can find the following jet algorithms, as well as other jet-related quantities:
* **slimmedJetsAK8**: ak4 AK8 energy-corrected jets using the PUPPI algorithm for pileup removal. Jets are selected iwth pT >170 GeV with all information, including PF candidate links(typically analysis cut will be at least pT>200). This has been the default collection for boosted jets in Run II. In this collection, you can find the following jet algorithms, as well as other jet-related quantities:
* Softdrop mass
* n-subjettiness and energy correlation variables
* Access to softdrop subjets
* Access to softdrop subjets with pT >30 GeV: minimal information for 3 leading jets.
* Access to the associated AK8 CHS jet four-momentum, including soft drop and pruned mass, and n-subjectness.
> ## Examples of how to access jet collections in miniAOD samples
Expand All @@ -244,32 +247,53 @@ Three main jet collections are stored in the MiniAOD format, as described [here]
>
{: .solution}
### NanoAOD
In nanoAOD, only AK4 CHS jets ( _Jet_ ) and AK8 PUPPI jets ( _FatJet_ ) are stored in Run 2. For Run 3, AK4 and AK8 jets are PUPPI jets. The jets in nanoAOD are similar to those in miniAOD, but not identical (for example, the $p_{\mathrm{T}}$ cuts might be different). A full set of variables for each jet collection can be found in this [website](https://cms-nanoaod-integration.web.cern.ch/autoDoc/NanoAODv9/2018UL/doc_TTToSemiLeptonic_TuneCP5_13TeV-powheg-pythia8_RunIISummer20UL18NanoAODv9-106X_upgrade2018_realistic_v16_L1v1-v1.html).
#### NanoAOD
NanoAOD is a "flat tree" format, meaning you can access the information directly with simple ROOT or even simple Python tools (like numpy or pandas). This format is recommended for analyses in CMS, unless one needs to access other variables not stored in nanoAOD. _This tutorial will only use nanoAOD files._
In nanoAOD, only AK4 CHS jets ( _Jet_ ) and AK8 PUPPI jets ( _FatJet_ ) are stored in Run 2. For Run 3, AK4 and AK8 jets are PUPPI jets. The jets in nanoAOD are similar to those in miniAOD, but not identical (for example, the $p_{\mathrm{T}}$ cuts might be different). In short:
* Jet = ak4PFJetsCHS
* pT >15 GeV
* Similar to miniAOD content, but many more (up-to-date) quantities (e.g. JEC)
* FatJet = ak8PFJetsPUPPI
* Similar content to miniAOD, but many more (up-to-date) quantities such as DeepXXX taggers
A full set of variables for each jet collection can be found in this [website](https://cms-nanoaod-integration.web.cern.ch/autoDoc/NanoAODv9/2018UL/doc_TTToSemiLeptonic_TuneCP5_13TeV-powheg-pythia8_RunIISummer20UL18NanoAODv9-106X_upgrade2018_realistic_v16_L1v1-v1.html).
Also possible to customize nanoAOD. JME/BTV have their extended format with more jet collections and/or PF candidates. It is a common format for “automatised” workflows and ML training.
> ## Note
> There are several advanced tools on the market which allow you to do sophisticated analysis using nanoAOD format, including [RDataFrame](https://root.cern/doc/master/classROOT_1_1RDataFrame.html), [NanoAOD-tools](https://github.com/cms-nanoAOD/nanoAOD-tools), or [Coffea](https://github.com/CoffeaTeam/coffea). We encourage you to look at them and use the one you like the most. However, we are going to use coffea for this tutorial.
{: .callout}
## Jet properties
### Jet properties
A short list of jet properties that we can find in nanoAOD are:
* Jet 4-vector = sum of all constituent particle 4-vectors: energy, pT, η, Φ
* Jet mass
* Jet constituent multiplicities (PF) ex. charged multiplicity
* Jet constituent fractions, ex. charged hadron energy fraction
* Jet area = area in η-Φ plane in which an infinitely soft particle will be clustered with the jet
* Jet tagging information
* and many more
### Exercise 1.3
> ## Open a notebook
> This preliminary exercise will illustrate some of the basic properties of jets, like the four-momentum quantities: pt, eta, phi, and mass. We will use nanoAOD files currently widely used with the CMS Collaborators. For more information about nanoAOD follow [this link](https://gitlab.cern.ch/cms-nanoAOD/nanoaod-doc/-/wikis/home). At the end of the notebook, you will be able to see all the quantities stored in the `Jet` collection.
>
> For this part, open the notebook called `Jets_101.ipynb` and run Exercise 1.
> For this part, open the notebook called `Jets_101.ipynb` and run Exercise 1.3
{: .checklist}
> ## Discussion 1.1
> ## Discussion 1.2
>
> Have you seen these jet quantities before? Were you expecting something different?
{: .discussion}
> ## Discussion 1.2
> ## Discussion 1.3
>
> Did you plot other jet quantities stored in nanoAOD? Do you understand the meaning of them?
{: .discussion}
Expand Down

0 comments on commit dd1c51a

Please sign in to comment.