Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relval wf 14949.402 fails at runtime in CMSSW_14_2_X #46693

Open
mmusich opened this issue Nov 13, 2024 · 7 comments
Open

Relval wf 14949.402 fails at runtime in CMSSW_14_2_X #46693

mmusich opened this issue Nov 13, 2024 · 7 comments

Comments

@mmusich
Copy link
Contributor

mmusich commented Nov 13, 2024

Title says it all, see #46686 (comment)

I tested in a plain CMSSW_14_2_X_2024-11-12-2300 (without anything checked out on top) with:

runTheMatrix.py --what gpu,upgrade -l 14949.402 -t 4

I see

$ more 14949.402_HydjetQMinBias_5020GeV+2022HI_Patatrack_PixelOnlyAlpaka/step3_HydjetQMinBias_5020GeV+2022HI_Patatrack_PixelOnlyAlpaka.log 
RAW2DIGI:RawToDigi_pixelOnly,RECO:reconstruction_pixelTrackingOnly,VALIDATION:@pixelTrackingOnlyValidation,DQM:@pixelTrackingOnlyDQM
We have determined that this is simulation (if not, rerun cmsDriver.py with --data)
with DB:
entry file:step2.root
Step: RAW2DIGI Spec: ['RawToDigi_pixelOnly']
Step: RECO Spec: ['reconstruction_pixelTrackingOnly']
Step: VALIDATION Spec: ['@pixelTrackingOnlyValidation']
@pixelTrackingOnlyValidation in preparing validation
Step: DQM Spec: ['@pixelTrackingOnlyDQM']
customising the process with customiseAlpakaServiceMemoryFilling from HeterogeneousCore/AlpakaServices/customiseAlpakaServiceMemoryFilling
customising the process with setCrossingFrameOn from SimGeneral/MixingModule/fullMixCustomize_cff
Starting  cmsRun  step3_RAW2DIGI_RECO_VALIDATION_DQM.py
%MSG-i ThreadStreamSetup:  (NoModuleName) 13-Nov-2024 14:45:57 CET pre-events
setting # threads 4
setting # streams 4
%MSG
%MSG-i AlpakaService:  (NoModuleName) 13-Nov-2024 14:45:59 CET pre-events
AlpakaServiceSerialSync succesfully initialised.
Found 1 device:
  - Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
%MSG
%MSG-i CUDAService:  (NoModuleName) 13-Nov-2024 14:45:59 CET pre-events
CUDA runtime version 12.4, driver version 12.4, NVIDIA driver version 550.127.05
CUDA device 0: Tesla T4 (sm_75)
%MSG
%MSG-i AlpakaService:  (NoModuleName) 13-Nov-2024 14:45:59 CET pre-events
AlpakaServiceCudaAsync succesfully initialised.
Found 1 device:
  - Tesla T4
%MSG
13-Nov-2024 14:46:06 CET  Initiating request to open file file:step2.root
13-Nov-2024 14:46:11 CET  Successfully opened file file:step2.root
%MSG-w NonConsumedConditionalModules:  AfterModConstruction  13-Nov-2024 14:46:14 CET pre-events
The following modules were part of some ConditionalTask, but were not
consumed by any other module in any of the Paths to which the ConditionalTask
was associated. Perhaps they should be either removed from the
job, or moved to a Task to make it explicit they are unscheduled.

 siPixelDigis@cpu
 siPixelRecHitsPreSplitting@cpu
%MSG
Begin processing the 1st record. Run 1, Event 1, LumiSection 1 on stream 2 at 13-Nov-2024 14:46:16.182 CET
Begin processing the 2nd record. Run 1, Event 6, LumiSection 1 on stream 1 at 13-Nov-2024 14:46:16.211 CET
Begin processing the 3rd record. Run 1, Event 7, LumiSection 1 on stream 3 at 13-Nov-2024 14:46:16.229 CET
Begin processing the 4th record. Run 1, Event 5, LumiSection 1 on stream 0 at 13-Nov-2024 14:46:16.229 CET
Begin processing the 5th record. Run 1, Event 9, LumiSection 1 on stream 2 at 13-Nov-2024 14:46:18.913 CET
Begin processing the 6th record. Run 1, Event 10, LumiSection 1 on stream 1 at 13-Nov-2024 14:46:18.955 CET
Begin processing the 7th record. Run 1, Event 2, LumiSection 1 on stream 3 at 13-Nov-2024 14:46:18.995 CET
Begin processing the 8th record. Run 1, Event 8, LumiSection 1 on stream 2 at 13-Nov-2024 14:46:19.258 CET
Begin processing the 9th record. Run 1, Event 4, LumiSection 1 on stream 0 at 13-Nov-2024 14:46:19.258 CET
Begin processing the 10th record. Run 1, Event 3, LumiSection 1 on stream 1 at 13-Nov-2024 14:46:20.056 CET
src/RecoVertex/PixelVertexFinding/plugins/alpaka/clusterTracksByDensity.h:205: void alpaka_cuda_async::vertexFinder::clusterTracksByDensity(const alpaka::AccGpuUniformCudaHipRt<alpaka::ApiCudaRt, std::in
tegral_constant<unsigned long, 1UL>, unsigned int> &, reco::ZVertexLayout<128UL, false>::ViewTemplateFreeParams<128UL, false, true, true> &, reco::ZVertexTracksLayout<128UL, false>::ViewTemplateFreeParam
s<128UL, false, true, true> &, vertexFinder::PixelVertexWSSoALayout<128UL, false>::ViewTemplateFreeParams<128UL, false, true, true> &, int, float, float, float): block: [0,0,0], thread: [640,0,0] Asserti
on `static_cast<int>(foundClusters) < data.metadata().size()` failed.

suspected to be related to #45887.
At the moment this workflow is not exercised in IBs - thus the issue went unnoticed.

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 13, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

A new Issue was created by @mmusich.

@Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@mmusich mmusich changed the title Reval wf 14949.402 fails at runtime in CMSSW_14_2_X Relval wf 14949.402 fails at runtime in CMSSW_14_2_X Nov 13, 2024
@makortel
Copy link
Contributor

assign RecoVertex/PixelVertexFinding

@cmsbuild
Copy link
Contributor

New categories assigned: reconstruction

@jfernan2,@mandrenguyen you have been requested to review this Pull request/Issue and eventually sign? Thanks

@makortel
Copy link
Contributor

assign heterogeneous

@cmsbuild
Copy link
Contributor

New categories assigned: heterogeneous

@fwyzard,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

@jfernan2
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants