Open issues regarding the ECAL local reconstruction on GPU #32480

fwyzard · 2020-12-14T17:05:53Z

Open issues regarding the ECAL local reconstruction on GPU

Update the ECAL ESProducts and move them to a more correct place
Migrate GPU code to use common constants and functions
Add references for magic numbers

fwyzard · 2020-12-14T17:06:03Z

Update the ECAL ESProducts and move them to a more correct place

Make the equivalent changes regarding to the conditions that were done for HCAL in #32039.

Move the conditions types used on the GPU from

EventFilter/EcalRawToDigi/interface/ElectronicsMappingGPU.h
RecoLocalCalo/EcalRecAlgos/interface/EcalGainRatiosGPU.h
RecoLocalCalo/EcalRecAlgos/interface/EcalIntercalibConstantsGPU.h
RecoLocalCalo/EcalRecAlgos/interface/EcalLaserAPDPNRatiosGPU.h
RecoLocalCalo/EcalRecAlgos/interface/EcalLaserAPDPNRatiosRefGPU.h
RecoLocalCalo/EcalRecAlgos/interface/EcalLaserAlphasGPU.h
RecoLocalCalo/EcalRecAlgos/interface/EcalLinearCorrectionsGPU.h
RecoLocalCalo/EcalRecAlgos/interface/EcalMultifitParametersGPU.h
RecoLocalCalo/EcalRecAlgos/interface/EcalPedestalsGPU.h
RecoLocalCalo/EcalRecAlgos/interface/EcalPulseCovariancesGPU.h
RecoLocalCalo/EcalRecAlgos/interface/EcalPulseShapesGPU.h
RecoLocalCalo/EcalRecAlgos/interface/EcalRecHitParametersGPU.h
RecoLocalCalo/EcalRecAlgos/interface/EcalRechitADCToGeVConstantGPU.h
RecoLocalCalo/EcalRecAlgos/interface/EcalRechitChannelStatusGPU.h
RecoLocalCalo/EcalRecAlgos/interface/EcalSamplesCorrelationGPU.h
RecoLocalCalo/EcalRecAlgos/interface/EcalTimeBiasCorrectionsGPU.h
RecoLocalCalo/EcalRecAlgos/interface/EcalTimeCalibConstantsGPU.h

and the corresponding .cc files to CondFormats/EcalObjects/

Update them to use

edm::propagate_const_array<cms::cuda::device::unique_ptr<float[]>> instead of float *
cms::cuda::make_device_unique<float[]> instead of cudaMalloc
cms::cuda::copyAsync instead of cudaMemcpyAsync

fwyzard · 2020-12-14T17:06:09Z

Migrate GPU code to use common constants and functions

The functions in RecoLocalCalo/EcalRecProducers/plugins/Common.h are only used by the GPU implementation.

They could be reused from DataFormats/EcalDigi/interface/EcalMGPASample.h.

cmsbuild · 2020-12-14T17:06:14Z

A new Issue was created by @fwyzard Andrea Bocci.

@Dr15Jones, @dpiparo, @silviodonato, @smuzaffar, @makortel, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

fwyzard · 2020-12-14T17:06:50Z

assign reconstruction

fwyzard · 2020-12-14T17:06:56Z

assign heterogeneous

cmsbuild · 2020-12-14T17:07:04Z

New categories assigned: heterogeneous,reconstruction

@slava77,@perrotta,@makortel,@jpata,@fwyzard you have been requested to review this Pull request/Issue and eventually sign? Thanks

fwyzard · 2020-12-14T17:07:08Z

@thomreis FYI

fwyzard · 2020-12-25T21:43:43Z

Add references for magic numbers

RecoLocalCalo/EcalRecProducers/plugins/EcalMultifitParametersGPUESProducer.cc uses variois hard-coded numbers without a clear explanation of their origin.
They appear to be reused in the same way in

RecoLocalCalo/EcalRecProducers/plugins/EcalUncalibRecHitWorkerMultiFit.cc
RecoLocalCalo/EcalRecProducers/plugins/EcalUncalibRecHitWorkerRatio.cc
RecoLocalCalo/EcalRecProducers/plugins/EcalUncalibRecHitWorkerGlobal.cc

The original CPU code should be updated (for example, moving those numbers to a central place and adding comments to explain their source) and the GPU code should then be updated accordingly.

jpata · 2021-01-11T09:02:55Z

Taking note of a few more specific todos:

EventFilter/EcalRawToDigi/plugins/EcalRawToDigiGPU.cc: esConsumes and constants
RecoLocalCalo/EcalRecProducers/plugins/EcalRecHitBuilderKernels.cu: very long function, many short variable names with a long lifetime (t_i, t_f, t1, t2 etc), several unaddressed TODO comments in the code like "how the heck is this possible", "Please clean up the code, ... also the original one!" etc
RecoLocalCalo/EcalRecProducers/plugins/EcalRecHitProducerGPU.cc: might not have to be edm::ExternalWork, https://github.com/cms-sw/cmssw/pull/31719/files#r511164561
RecoLocalCalo/EcalRecProducers/plugins/KernelHelpers.cu: some hardcoded constants, comments like "why on hell things are so complex and not simple" to be removed
RecoLocalCalo/EcalRecProducers/plugins/TimeComputationKernels.cu: hardcoded constants
RecoLocalCalo/EcalRecProducers/python/ecalRecHitGPU_cfi.py: TODOs in the code with unclear status
CPU to GPU workflow comparisons do not produce equivalent results, Patatrack integration - ECAL local reconstruction (7/N) #31719 (comment)

thomreis · 2021-03-02T16:13:10Z

Update the ECAL ESProducts and move them to a more correct place

Make the equivalent changes regarding to the conditions that were done for HCAL in #32039.

Move the conditions types used on the GPU from

* `EventFilter/EcalRawToDigi/interface/ElectronicsMappingGPU.h`

* `RecoLocalCalo/EcalRecAlgos/interface/EcalGainRatiosGPU.h`

* `RecoLocalCalo/EcalRecAlgos/interface/EcalIntercalibConstantsGPU.h`

* `RecoLocalCalo/EcalRecAlgos/interface/EcalLaserAPDPNRatiosGPU.h`

* `RecoLocalCalo/EcalRecAlgos/interface/EcalLaserAPDPNRatiosRefGPU.h`

* `RecoLocalCalo/EcalRecAlgos/interface/EcalLaserAlphasGPU.h`

* `RecoLocalCalo/EcalRecAlgos/interface/EcalLinearCorrectionsGPU.h`

* `RecoLocalCalo/EcalRecAlgos/interface/EcalMultifitParametersGPU.h`

* `RecoLocalCalo/EcalRecAlgos/interface/EcalPedestalsGPU.h`

* `RecoLocalCalo/EcalRecAlgos/interface/EcalPulseCovariancesGPU.h`

* `RecoLocalCalo/EcalRecAlgos/interface/EcalPulseShapesGPU.h`

* `RecoLocalCalo/EcalRecAlgos/interface/EcalRecHitParametersGPU.h`

* `RecoLocalCalo/EcalRecAlgos/interface/EcalRechitADCToGeVConstantGPU.h`

* `RecoLocalCalo/EcalRecAlgos/interface/EcalRechitChannelStatusGPU.h`

* `RecoLocalCalo/EcalRecAlgos/interface/EcalSamplesCorrelationGPU.h`

* `RecoLocalCalo/EcalRecAlgos/interface/EcalTimeBiasCorrectionsGPU.h`

* `RecoLocalCalo/EcalRecAlgos/interface/EcalTimeCalibConstantsGPU.h`

and the corresponding .cc files to CondFormats/EcalObjects/

Update them to use

* `edm::propagate_const_array<cms::cuda::device::unique_ptr<float[]>>` instead of `float *`

* `cms::cuda::make_device_unique<float[]>` instead of `cudaMalloc`

* `cms::cuda::copyAsync` instead of `cudaMemcpyAsync`

Hi @fwyzard is the change of the types and to the cms::cuda functions supposed to work for all occurrences? For some (like the ones in EcalGainRatiosGPU.h) this compiles fine but for others (e.g. in EcalMultifitParametersGPU.h) I get compilation errors like this one:

CMSSW_11_3_0_pre3/src/CondFormats/EcalObjects/src/EcalMultifitParametersGPU.cc:27:83: error: cannot convert 'cms::cuda::device::impl::make_device_unique_selector<double []>::unbounded_array' {aka 'std::unique_ptr<double [], cms::cuda::device::impl::DeviceDeleter>'} to 'double*' in assignment
   27 |         product.amplitudeFitParametersEB = cms::cuda::make_device_unique<double[]>(amplitudeFitParametersEB_.size(), cudaStream);
      |                                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                                                                                   |
      |                                                                                   cms::cuda::device::impl::make_device_unique_selector<double []>::unbounded_array {aka std::unique_ptr<double [], cms::cuda::device::impl::DeviceDeleter>}

and also this:

CMSSW_11_3_0_pre3/src/CondFormats/EcalObjects/src/EcalMultifitParametersGPU.cc:33:101: error: no matching function for call to 'copyAsync(double*&, const std::vector<double, cms::cuda::HostAllocator<double> >&, CUstream_st*&)'
   33 |         cms::cuda::copyAsync(product.amplitudeFitParametersEE, amplitudeFitParametersEE_, cudaStream);

makortel · 2021-03-02T16:17:43Z

Is product.amplitudeFitParametersEB perhaps double*? In that case it should be changed to cms::cuda::device::unique_ptr<double[]>.

thomreis · 2021-03-02T16:20:44Z

No it is edm::propagate_const_array<cms::cuda::device::unique_ptr<double[]>> amplitudeFitParametersEB;. I have changed it from double*.

makortel · 2021-03-03T01:15:17Z

The second error also suggests product.amplitudeFitParametersEE to be a double *. Could you perhaps point to the code?

fwyzard · 2021-03-03T09:23:52Z

I think the current code is this:

  struct Product {
    ~Product();
    double *amplitudeFitParametersEB, *amplitudeFitParametersEE, *timeFitParametersEB, *timeFitParametersEE;
  };

and (just a fragment):

        // malloc
        cudaCheck(cudaMalloc((void**)&product.amplitudeFitParametersEB,
                             this->amplitudeFitParametersEB_.size() * sizeof(double)));

        // transfer
        cudaCheck(cudaMemcpyAsync(product.amplitudeFitParametersEB,
                                  this->amplitudeFitParametersEB_.data(),
                                  this->amplitudeFitParametersEB_.size() * sizeof(double),
                                  cudaMemcpyHostToDevice,
                                  cudaStream));

I think it should become something like

  struct Product {
    edm::propagate_const_array<cms::cuda::device::unique_ptr<double[]>> amplitudeFitParametersEB;
    edm::propagate_const_array<cms::cuda::device::unique_ptr<double[]>> amplitudeFitParametersEE;
    edm::propagate_const_array<cms::cuda::device::unique_ptr<double[]>> timeFitParametersEB;
    edm::propagate_const_array<cms::cuda::device::unique_ptr<double[]>> timeFitParametersEE;
  };

and:

        // malloc
        amplitudeFitParametersEB = cms::cuda::make_device_unique<double[]>(amplitudeFitParametersEB_.size(), stream);

        // transfer
        cms::cuda::copyAsync(product.amplitudeFitParametersEB, amplitudeFitParametersEB_, stream);

but I've been typing this in here without actually testing, so double check everything !

thomreis · 2021-03-03T09:42:35Z

My branch is here: https://github.com/thomreis/cmssw/tree/ecal-local-reco-gpu-fix-issues/CondFormats/EcalObjects
I did not commit the changes yet since they do not compile.
But the changes I made look like what you have posted above (apart from the stream variable being named cudaStream)

In EcalMultifitParametersGPU.h:

class EcalMultifitParametersGPU {
public:
  struct Product {
    edm::propagate_const_array<cms::cuda::device::unique_ptr<double[]>> amplitudeFitParametersEB;
    edm::propagate_const_array<cms::cuda::device::unique_ptr<double[]>> amplitudeFitParametersEE;
    edm::propagate_const_array<cms::cuda::device::unique_ptr<double[]>> timeFitParametersEB;
    edm::propagate_const_array<cms::cuda::device::unique_ptr<double[]>> timeFitParametersEE;
  };

In EcalMultifitParametersGPU.cc:

EcalMultifitParametersGPU::Product const& EcalMultifitParametersGPU::getProduct(cudaStream_t cudaStream) const {
  auto const& product = product_.dataForCurrentDeviceAsync(
      cudaStream, [this](EcalMultifitParametersGPU::Product& product, cudaStream_t cudaStream) {
        // allocate
        product.amplitudeFitParametersEB = cms::cuda::make_device_unique<double[]>(amplitudeFitParametersEB_.size(), cudaStream);
        product.amplitudeFitParametersEE = cms::cuda::make_device_unique<double[]>(amplitudeFitParametersEE_.size(), cudaStream);
        product.timeFitParametersEB = cms::cuda::make_device_unique<double[]>(timeFitParametersEB_.size(), cudaStream);
        product.timeFitParametersEE = cms::cuda::make_device_unique<double[]>(timeFitParametersEE_.size(), cudaStream);
        // transfer
        cms::cuda::copyAsync(product.amplitudeFitParametersEB, amplitudeFitParametersEB_, cudaStream);
        cms::cuda::copyAsync(product.amplitudeFitParametersEE, amplitudeFitParametersEE_, cudaStream);
        cms::cuda::copyAsync(product.timeFitParametersEB, timeFitParametersEB_, cudaStream);
        cms::cuda::copyAsync(product.timeFitParametersEE, timeFitParametersEE_, cudaStream);
      });
  return product;
}

fwyzard · 2021-03-03T10:38:14Z

I've tried making these changes on top of CMSSW_11_3_0_pre3:

diff --git a/RecoLocalCalo/EcalRecAlgos/interface/EcalMultifitParametersGPU.h b/RecoLocalCalo/EcalRecAlgos/interface/EcalMultifitParametersGPU.h
index 56aa0579ff77..a6c0b1c81aa2 100644
--- a/RecoLocalCalo/EcalRecAlgos/interface/EcalMultifitParametersGPU.h
+++ b/RecoLocalCalo/EcalRecAlgos/interface/EcalMultifitParametersGPU.h
@@ -4,6 +4,8 @@
 #include <array>
 
 #include "FWCore/ParameterSet/interface/ParameterSet.h"
+#include "FWCore/Utilities/interface/propagate_const_array.h"
+#include "HeterogeneousCore/CUDAUtilities/interface/device_unique_ptr.h"
 
 #ifndef __CUDACC__
 #include "HeterogeneousCore/CUDAUtilities/interface/HostAllocator.h"
@@ -13,8 +15,10 @@
 class EcalMultifitParametersGPU {
 public:
   struct Product {
-    ~Product();
-    double *amplitudeFitParametersEB, *amplitudeFitParametersEE, *timeFitParametersEB, *timeFitParametersEE;
+    edm::propagate_const_array<cms::cuda::device::unique_ptr<double[]>> amplitudeFitParametersEB;
+    edm::propagate_const_array<cms::cuda::device::unique_ptr<double[]>> amplitudeFitParametersEE;
+    edm::propagate_const_array<cms::cuda::device::unique_ptr<double[]>> timeFitParametersEB;
+    edm::propagate_const_array<cms::cuda::device::unique_ptr<double[]>> timeFitParametersEE;
   };
 
 #ifndef __CUDACC__
@@ -29,8 +33,10 @@ public:
   }
 
 private:
-  std::vector<double, cms::cuda::HostAllocator<double>> amplitudeFitParametersEB_, amplitudeFitParametersEE_,
-      timeFitParametersEB_, timeFitParametersEE_;
+  std::vector<double, cms::cuda::HostAllocator<double>> amplitudeFitParametersEB_;
+  std::vector<double, cms::cuda::HostAllocator<double>> amplitudeFitParametersEE_;
+  std::vector<double, cms::cuda::HostAllocator<double>> timeFitParametersEB_;
+  std::vector<double, cms::cuda::HostAllocator<double>> timeFitParametersEE_;
 
   cms::cuda::ESProduct<Product> product_;
 #endif  // __CUDACC__
diff --git a/RecoLocalCalo/EcalRecAlgos/src/EcalMultifitParametersGPU.cc b/RecoLocalCalo/EcalRecAlgos/src/EcalMultifitParametersGPU.cc
index 010da6444b61..149ba92ff170 100644
--- a/RecoLocalCalo/EcalRecAlgos/src/EcalMultifitParametersGPU.cc
+++ b/RecoLocalCalo/EcalRecAlgos/src/EcalMultifitParametersGPU.cc
@@ -1,7 +1,6 @@
-#include "RecoLocalCalo/EcalRecAlgos/interface/EcalMultifitParametersGPU.h"
-
 #include "FWCore/Utilities/interface/typelookup.h"
-#include "HeterogeneousCore/CUDAUtilities/interface/cudaCheck.h"
+#include "HeterogeneousCore/CUDAUtilities/interface/copyAsync.h"
+#include "RecoLocalCalo/EcalRecAlgos/interface/EcalMultifitParametersGPU.h"
 
 EcalMultifitParametersGPU::EcalMultifitParametersGPU(edm::ParameterSet const& ps) {
   auto const& amplitudeFitParametersEB = ps.getParameter<std::vector<double>>("EBamplitudeFitParameters");
@@ -20,45 +19,20 @@ EcalMultifitParametersGPU::EcalMultifitParametersGPU(edm::ParameterSet const& ps
   std::copy(timeFitParametersEE.begin(), timeFitParametersEE.end(), timeFitParametersEE_.begin());
 }
 
-EcalMultifitParametersGPU::Product::~Product() {
-  cudaCheck(cudaFree(amplitudeFitParametersEB));
-  cudaCheck(cudaFree(amplitudeFitParametersEE));
-  cudaCheck(cudaFree(timeFitParametersEB));
-  cudaCheck(cudaFree(timeFitParametersEE));
-}
-
 EcalMultifitParametersGPU::Product const& EcalMultifitParametersGPU::getProduct(cudaStream_t cudaStream) const {
   auto const& product = product_.dataForCurrentDeviceAsync(
       cudaStream, [this](EcalMultifitParametersGPU::Product& product, cudaStream_t cudaStream) {
-        // malloc
-        cudaCheck(cudaMalloc((void**)&product.amplitudeFitParametersEB,
-                             this->amplitudeFitParametersEB_.size() * sizeof(double)));
-        cudaCheck(cudaMalloc((void**)&product.amplitudeFitParametersEE,
-                             this->amplitudeFitParametersEE_.size() * sizeof(double)));
-        cudaCheck(cudaMalloc((void**)&product.timeFitParametersEB, this->timeFitParametersEB_.size() * sizeof(double)));
-        cudaCheck(cudaMalloc((void**)&product.timeFitParametersEE, this->timeFitParametersEE_.size() * sizeof(double)));
+        // allocate GPU memory
+        product.amplitudeFitParametersEB = cms::cuda::make_device_unique<double[]>(amplitudeFitParametersEB_.size(), cudaStream);
+        product.amplitudeFitParametersEE = cms::cuda::make_device_unique<double[]>(amplitudeFitParametersEE_.size(), cudaStream);
+        product.timeFitParametersEB = cms::cuda::make_device_unique<double[]>(timeFitParametersEB_.size(), cudaStream);
+        product.timeFitParametersEE = cms::cuda::make_device_unique<double[]>(timeFitParametersEE_.size(), cudaStream);
 
         // transfer
-        cudaCheck(cudaMemcpyAsync(product.amplitudeFitParametersEB,
-                                  this->amplitudeFitParametersEB_.data(),
-                                  this->amplitudeFitParametersEB_.size() * sizeof(double),
-                                  cudaMemcpyHostToDevice,
-                                  cudaStream));
-        cudaCheck(cudaMemcpyAsync(product.amplitudeFitParametersEE,
-                                  this->amplitudeFitParametersEE_.data(),
-                                  this->amplitudeFitParametersEE_.size() * sizeof(double),
-                                  cudaMemcpyHostToDevice,
-                                  cudaStream));
-        cudaCheck(cudaMemcpyAsync(product.timeFitParametersEB,
-                                  this->timeFitParametersEB_.data(),
-                                  this->timeFitParametersEB_.size() * sizeof(double),
-                                  cudaMemcpyHostToDevice,
-                                  cudaStream));
-        cudaCheck(cudaMemcpyAsync(product.timeFitParametersEE,
-                                  this->timeFitParametersEE_.data(),
-                                  this->timeFitParametersEE_.size() * sizeof(double),
-                                  cudaMemcpyHostToDevice,
-                                  cudaStream));
+        cms::cuda::copyAsync(product.amplitudeFitParametersEB, amplitudeFitParametersEB_, cudaStream);
+        cms::cuda::copyAsync(product.amplitudeFitParametersEE, amplitudeFitParametersEE_, cudaStream);
+        cms::cuda::copyAsync(product.timeFitParametersEB, timeFitParametersEB_, cudaStream);
+        cms::cuda::copyAsync(product.timeFitParametersEE, timeFitParametersEE_, cudaStream);
       });
   return product;
 }

and they seem to build fine.

thomreis · 2021-03-03T10:55:10Z

Thanks for confirming this. The only difference to my version that I can see is that I have moved the headers to CondFormats/EcalConds already. Maybe some BuildFile issue.

fwyzard · 2021-03-03T10:58:16Z

A silly thing: is it possible you are still including the old header from RecoLocalCalo/EcalRecAlgos/interface/EcalMultifitParametersGPU.h, which is then found in the release, and uses the old types ?

thomreis · 2021-03-03T11:20:21Z

Bingo! At least for two of the files with issues.

I still get some errors on other files. Does the recipe also work for classes that use std::vector<float> const& like, e.g. here: https://github.com/thomreis/cmssw/blob/ecal-local-reco-gpu-fix-issues/CondFormats/EcalObjects/interface/EcalTimeBiasCorrectionsGPU.h#L39-L42
The ones using std::vector<float, cms::cuda::HostAllocator<float>> all work now.

fwyzard · 2021-03-03T12:52:34Z

No, it only works for vectors that use the HostAllocator; a standard vector does not "pin" its memory, so the copies need an extra step to be asynchronous (which is done internally by CUDA, but requires a worker thread and an extra memory buffer).

Can you change the type to std::vector<float, cms::cuda::HostAllocator<float>> ? or is there any reason why not ?

thomreis · 2021-03-03T14:22:58Z

I do not know why it was implemented like that for those classes. Maybe @amassiro does?

If it can be changed to std::vector<float, cms::cuda::HostAllocator<float>> does the currently implemented strategy to have only one Product.values variable that can be allocated for valuesEB_ and with an offset also for valuesEE_ work? Like, e.g. in https://github.com/thomreis/cmssw/blob/ecal-local-reco-gpu-fix-issues/CondFormats/EcalObjects/src/EcalPulseCovariancesGPU.cc#L17-L36

amassiro · 2021-03-03T14:44:11Z

I do not know why it was implemented like that for those classes. Maybe @amassiro does?

I don't remember, it could be that at the beginning it was the only option.
Maybe @vkhristenko remembers the motivation

If it can be changed to std::vector<float, cms::cuda::HostAllocator<float>> does the currently implemented strategy to have only one Product.values variable that can be allocated for valuesEB_ and with an offset also for valuesEE_ work? Like, e.g. in https://github.com/thomreis/cmssw/blob/ecal-local-reco-gpu-fix-issues/CondFormats/EcalObjects/src/EcalPulseCovariancesGPU.cc#L17-L36

vkhristenko · 2021-03-03T14:57:17Z

I do not know why it was implemented like that for those classes. Maybe @amassiro does?

I don't remember, it could be that at the beginning it was the only option.
Maybe @vkhristenko remembers the motivation

If it can be changed to std::vector<float, cms::cuda::HostAllocator<float>> does the currently implemented strategy to have only one Product.values variable that can be allocated for valuesEB_ and with an offset also for valuesEE_ work? Like, e.g. in https://github.com/thomreis/cmssw/blob/ecal-local-reco-gpu-fix-issues/CondFormats/EcalObjects/src/EcalPulseCovariancesGPU.cc#L17-L36

I do not remember if there was any reason for this (if not documented, prolly no), but just change to use that allocator accordingly (have to make copies, etc...)

thomreis · 2021-03-04T17:26:03Z

The issue seems to lie here:

CMSSW_11_3_0_pre3/src/CondFormats/EcalObjects/src/EcalIntercalibConstantsGPU.cc:7:70: error: no matching function for call to 'std::vector<float, cms::cuda::HostAllocator<float> >::vector(<brace-enclosed initializer list>)'
    7 |     : valuesEB_{values.barrelItems()}, valuesEE_{values.endcapItems()} {}

vkhristenko · 2021-03-04T17:32:07Z

Types are different, w/ and w/o host allocator

…

On Thu, 4 Mar 2021 at 18:26, Thomas Reis ***@***.***> wrote: The issue seems to lie here: CMSSW_11_3_0_pre3/src/CondFormats/EcalObjects/src/EcalIntercalibConstantsGPU.cc:7:70: error: no matching function for call to 'std::vector<float, cms::cuda::HostAllocator<float> >::vector(<brace-enclosed initializer list>)' 7 | : valuesEB_{values.barrelItems()}, valuesEE_{values.endcapItems()} {} — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#32480 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABSFUCO3TFSF7DI4GHUDCPLTB67DZANCNFSM4U3BKX2A> .

fwyzard · 2021-03-04T17:53:47Z

The issue seems to lie here:

CMSSW_11_3_0_pre3/src/CondFormats/EcalObjects/src/EcalIntercalibConstantsGPU.cc:7:70: error: no matching function for call to 'std::vector<float, cms::cuda::HostAllocator<float> >::vector(<brace-enclosed initializer list>)'
    7 |     : valuesEB_{values.barrelItems()}, valuesEE_{values.endcapItems()} {}

Yes - that is on purpose.
As I wrote earlier:

a standard vector does not "pin" its memory, so the copies need an extra step to be asynchronous (which is done internally by CUDA, but requires a worker thread and an extra memory buffer).

If there are no downsides, to me it seems better to change the vector to use the HostAllocator so the copy can be fully asynchronous.

@makortel what do you think ?

makortel · 2021-03-04T19:09:28Z

If there are no downsides, to me it seems better to change the vector to use the HostAllocator so the copy can be fully asynchronous.

I agree. The only downside I can think of is an additional copy of the data, but I think being able to copy asynchronously benefits more than that.

thomreis · 2021-03-05T11:16:25Z

Is there a way to make this compile:

        auto const offset = this->valuesEB_.size();
        // transfer
        cms::cuda::copyAsync(product.values, valuesEB_, cudaStream);
        cms::cuda::copyAsync(product.values + offset, valuesEE_, cudaStream);

with values being

  struct Product {
    edm::propagate_const_array<cms::cuda::device::unique_ptr<float[]>> values;
  };

and valuesEB_ an valuesEE_ being

  std::vector<float, cms::cuda::HostAllocator<float>> valuesEB_;
  std::vector<float, cms::cuda::HostAllocator<float>> valuesEE_;

The current compile error is

CMSSW_11_3_0_pre3/src/CondFormats/EcalObjects/src/EcalIntercalibConstantsGPU.cc:31:76: error: no matching function for call to 'copyAsync(edm::propagate_const_array<std::unique_ptr<float [], cms::cuda::device::impl::DeviceDeleter> >::element_type*, const std::vector<float, cms::cuda::HostAllocator<float> >&, CUstream_st*&)'
   31 |         cms::cuda::copyAsync(product.values + offset, valuesEE_, cudaStream);

Or is there need for separate edm::propagate_const_array<cms::cuda::device::unique_ptr<float[]>> valuesEB and edm::propagate_const_array<cms::cuda::device::unique_ptr<float[]>> valuesEE?

fwyzard · 2021-03-05T13:52:27Z

Is there a way to make this compile [...]

No.

Or is there need for separate edm::propagate_const_array<cms::cuda::device::unique_ptr<float[]>> valuesEB and edm::propagate_const_array<cms::cuda::device::unique_ptr<float[]>> valuesEE?

If you want to use a single vector on the GPU, you should use a single vector on the CPU as well: fill it with the EB values first and EE values after.

thomreis · 2021-03-05T14:27:08Z

Thanks. @amassiro @vkhristenko is this the preferred version? Having a single vector on the GPU?

amassiro · 2021-03-05T14:33:40Z

Thanks. @amassiro @vkhristenko is this the preferred version? Having a single vector on the GPU?

For me it's the same, let's decide one strategy and propagate the same format everywhere.
I just don't want one strategy for uncalibrehcit and another for rechit conditions.
The rechit were adapted to the same procedure used by unucalibrechit, it's ok to have one single vector, if it makes things faster, everywhere.

thomreis · 2021-03-05T15:04:47Z

I have seen both. E.g. EcalGainRatiosGPU (https://github.com/cms-sw/cmssw/blob/master/RecoLocalCalo/EcalRecAlgos/src/EcalGainRatiosGPU.cc#L8-L20) uses one vector for EB and EE while EcalMultifitParametersGPU (https://github.com/cms-sw/cmssw/blob/master/RecoLocalCalo/EcalRecAlgos/interface/EcalMultifitParametersGPU.h#L17) uses separate ones for EB and EE.

thomreis · 2021-03-24T09:14:24Z

Hi @fwyzard the first two items are done now with PR #33116 merged.

fwyzard · 2021-03-24T09:32:22Z

👍

cmsbuild added the pending-assignment label Dec 14, 2020

cmsbuild added heterogeneous-pending pending-signatures reconstruction-pending and removed pending-assignment labels Dec 14, 2020

fwyzard mentioned this issue Dec 14, 2020

Patatrack integration - ECAL local reconstruction (7/N) #31719

Merged

thomreis mentioned this issue Mar 9, 2021

Addressing ECAL local reconstruction on GPU issues #33116

Merged

Open issues regarding the ECAL local reconstruction on GPU #32480

Open issues regarding the ECAL local reconstruction on GPU #32480

Comments

fwyzard commented Dec 14, 2020 • edited Loading