Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HLT crashes in run 388769 and 388770: InvalidReference exception involving DetSetVector::inserv called with index already in collection; #46783

Open
mmusich opened this issue Nov 23, 2024 · 7 comments

Comments

@mmusich
Copy link
Contributor

mmusich commented Nov 23, 2024

On Nov-22, 2024, during runs 388769 and 388770 (PbPb stable beams collisions, HLT release CMSSW_14_1_5_patch2), we got hundreds of HLT crashes (509 for 388769 e-log and 1 for 388770, e-log) involving the following exception messages:

An exception of category 'InvalidReference' occurred while
   [0] Processing  Event run: 388769 lumi: 2 event: 708614 stream: 14
   [1] Running path 'HLT_HIUPC_DoubleEG5_BptxAND_SinglePixelTrack_MaxPixelTrack_v15'
   [2] Calling method for module SiPixelDigisClustersFromSoAAlpakaHIonPhase1/'hltSiPixelClustersPPOnAA'
Exception Message:
DetSetVector::inserv called with index already in collection;
index value: 303079452

or

An exception of category 'InvalidReference' occurred while
   [0] Processing  Event run: 388770 lumi: 94 event: 102837548 stream: 16
   [1] Running path 'DQM_PixelReconstruction_v11'
   [2] Calling method for module SiPixelDigisClustersFromSoAAlpakaPhase1/'hltSiPixelClusters'
Exception Message:
DetSetVector::inserv called with index already in collection;
index value: 353118212

The exception is reminiscent of an earlier issue documented at #39045.
From preliminary investigation the crashes seem to be related to a new version of the pixel firmware uploaded online on Nov, 22.

The logs from F3 Mon are attached to the thread.

f3mon_logtable_2024-11-23T08_18_32.480Z.txt

f3mon_logtable_2024-11-23T08_18_18.602Z.txt

Once error stream files will be made available we'll attempt to reproduce.

Cc:
@cms-sw/hlt-l2 @cms-sw/heterogeneous-l2 @trocino @vince502

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 23, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

A new Issue was created by @mmusich.

@Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@missirol
Copy link
Contributor

Some of the error files from those runs can be found at

/eos/cms/store/group/tsg/FOG/error_stream_root/run388769
/eos/cms/store/group/tsg/FOG/error_stream_root/run388770

Below is a reproducer tested on lxplus800 with CMSSW_14_1_5_patch2 using one of those files.

#!/bin/bash

# cmsrel CMSSW_14_1_5_patch2
# cd CMSSW_14_1_5_patch2/src
# cmsenv

hltLabel=hlt
hltMenu=run:388769
globalTag=141X_dataRun3_HLT_v1

hltGetConfiguration \
  "${hltMenu}" \
  --globaltag "${globalTag}" \
  --data \
  --no-prescale \
  --no-output \
  --max-events 1 \
  --input root://eoscms.cern.ch//eos/cms/store/group/tsg/FOG/error_stream_root/run388769/run388769_ls0186_index000175_fu-c2b03-06-01_pid4137691.root \
  --path HLT_HIUPC_DoubleEG5_BptxAND_SinglePixelTrack_MaxPixelTrack_v* \
  > "${hltLabel}".py

cat <<@EOF >> "${hltLabel}".py
process.options.numberOfThreads = 1
process.options.numberOfStreams = 0

del process.MessageLogger
process.load('FWCore.MessageLogger.MessageLogger_cfi')

process.source.skipEvents = cms.untracked.uint32( 90 )
@EOF

cmsRun "${hltLabel}".py &> "${hltLabel}".log

@mmusich
Copy link
Contributor Author

mmusich commented Nov 25, 2024

assign hlt, heterogeneous

@mmusich
Copy link
Contributor Author

mmusich commented Nov 25, 2024

@cms-sw/trk-dpg-l2 @ferencek @mroguljic FYI

@cmsbuild
Copy link
Contributor

New categories assigned: hlt,heterogeneous

@fwyzard,@makortel,@Martin-Grunewald,@mmusich you have been requested to review this Pull request/Issue and eventually sign? Thanks

@mmusich
Copy link
Contributor Author

mmusich commented Nov 25, 2024

type trk

@cmsbuild cmsbuild added the trk label Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants