Framework for accelerated ttH(bb) array analysis on flat centrally produced nanoAOD samples applying object and event selection as well as pileup, lepton and btagging reweighting directly on a GPU! Or, if you don't have a GPU, no big deal, the same code works on the CPU too, just somewhat slower! No batch jobs or servers required!
Requirements:
- python 3
- uproot
- awkward-array
- numba
- coffea tools (fermilab analysis tools)
Optional for CUDA acceleration:
- cupy
- cudatoolkit
git clone [email protected]:chreissel/hepaccelerate.git
cd hepaccelerate
The framework needs a list of files to process. These files can be either local files or files accessible via xrootd. Note reading files via xrootd will be much slower than caching local files, so if you are running the analysis you might consider copying the files to your own cluster using phedex.
#prepare a list of files to read
#replace /nvmedata with your local location of ROOT files
find /nvmedata/store/mc/RunIIFall17NanoAOD/ttHTobb_M125_TuneCP5_13TeV-powheg-pythia8*/NANOAODSIM -name "*.root > filelist.txt
In order to process the analysis, simply run:
#Run the test analysis
PYTHONPATH=hepaccelerate:coffea:. python3 run_analysis.py --filelist filelist.txt --sample ttHTobb_M125_TuneCP5_13TeV-powheg-pythia8
#output will be stored in this json
cat out_ttHTobb_M125_TuneCP5_13TeV-powheg-pythia8.json
This script loads the ROOT files, prepares local caches from the branches you read and processes the data. The output file contains the weighted histograms.
#second time around, you can load the data from the cache, which is much faster
PYTHONPATH=hepaccelerate:coffea:. python3 run_analysis.py --filelist filelist.txt --sample ttHTobb_M125_TuneCP5_13TeV-powheg-pythia8 --from-cache
#use CUDA for array processing on a GPU!
PYTHONPATH=hepaccelerate:coffea:. python3 run_analysis.py --filelist filelist.txt --sample ttHTobb_M125_TuneCP5_13TeV-powheg-pythia8 --from-cache --use-cuda
Object definitions, event selection cuts and files needed for scale factor calculations can be found in definitions_analysis.py
.
python counts.py --filelist filelist.txt
Running this script gives the total weighted sum of generated events for all files defined in filelist.txt
. After changing the filelist, update the weighted sum of generated events in definitions_analysis.py
.
Looking for the functions used for analysing the datasets? All important, analysis specific functions are defined in lib_analysis.py
.
The scripts submit_cpu.sh
and submit_gpu.sh
in their current form submit one slurm job per sample. To use them, log in to the correct node (ssh t3ui07
for cpu, ssh t3login
for gpu), activate the correct environment:
export PATH=/work/creissel/anaconda3/bin:$PATH
source activate hepaccelerate_cpu
OR
source activate hepaccelerate_gpu
and simply source the submit
script. The arguments passed to run_analysis.py
are defined in c(g)pu_job.sh
.
On your local machine run
ssh -f [email protected] -L 8008:localhost:8008 -N
Then log in to tier3 and run
jupyter notebook --no-browser --port=8008
You will get a URL with a token to copy-paste into your browser. Done.