Skip to content

Commit

Permalink
fix: introduce sorted index proxy to allow early break on neighbor se…
Browse files Browse the repository at this point in the history
…arch (#1581)

### Briefly, what does this PR introduce?
This PR originally intended to add hit sorting based on layer to achieve
early breaks in the neighbor searching (i.e. if neighbors have to be at
most two layers different in a ZDC-like detector, then in a sorted list
we can stop evaluating for neighbors as soon as we encounter the first
hit with more than two layers difference).

This PR also makes several other changes to the topological clustering.
- Hits are now referred to by an index proxy ordered set. This set (with
ordering by layer and then index in collection) is modified as hits are
assigned to clusters, such that only eligible hits remain in the set
(once ordered, erasure does of course not affect ordering). The need for
a separate vector which keeps track of already assigned hits is avoided.
As more hits are assigned, the remaining search space is reduced,
allowing for more efficient searching.
- Clusters (groups) are now stored in a vector ordered by insertion
sequence instead of a set (which is ordered but not in a useful way).
This allows us to avoid rechecking already added hits for neighbors, and
only check newly added hits for new neighbors to add.

When modifying containers that we are iterating over, we have to make
sure we do not invalidate iterators. Note
https://en.cppreference.com/w/cpp/container#Iterator_invalidation. A
vector container may change capacity, which invalidates all iterators.
Iterators on a set are only invalidated when the element pointed to is
erased. Erasure returns an iterator to the next element. Therefore, we
postpone erasure of the starting hit of each group until after the
entire group has been collected (and we have to make sure to avoid
picking it again as a hit). If we erased the starting hit right away,
the next iterator returned by the erase operation would likely become
invalid when the hit is (likely, due to ordering) used in the cluster.

### What kind of change does this PR introduce?
- [x] Bug fix (issue #1573)
- [ ] New feature (issue #__)
- [ ] Documentation update
- [ ] Other: __

### Please check if this PR fulfills the following:
- [ ] Tests for the changes have been added
- [ ] Documentation has been added / updated
- [ ] Changes have been communicated to collaborators

### Does this PR introduce breaking changes? What changes might users
need to make to their code?
No.

### Does this PR change default behavior?
No.
  • Loading branch information
wdconinc authored Aug 23, 2024
1 parent d1fcc45 commit 180fc07
Show file tree
Hide file tree
Showing 3 changed files with 103 additions and 51 deletions.
150 changes: 101 additions & 49 deletions src/algorithms/calorimetry/ImagingTopoCluster.h
Original file line number Diff line number Diff line change
@@ -1,17 +1,27 @@
// SPDX-License-Identifier: LGPL-3.0-or-later
// Copyright (C) 2022 Chao Peng, Sylvester Joosten, Whitney Armstrong
// Copyright (C) 2022 Chao Peng, Sylvester Joosten, Whitney Armstrong, Wouter Deconinck

/*
* Topological Cell Clustering Algorithm for Imaging Calorimetry
* 1. group all the adjacent pixels
*
* Author: Chao Peng (ANL), 06/02/2021
* References: https://arxiv.org/pdf/1603.02934.pdf
* Original reference: https://arxiv.org/pdf/1603.02934.pdf
*
* Modifications:
*
* Wouter Deconinck (Manitoba), 08/24/2024
* - converted hit storage model from std::vector to std::set sorted on layer
* where only hits remaining to be assigned to a group are in the set
* - erase hits that are too low in energy to be part of a cluster
* - converted group storage model frmo std::set to std::list to allow adding
* hits while keeping iterators valid
*
*/
#pragma once

#include <algorithm>
#include <numeric>

#include <algorithms/algorithm.h>
#include <DD4hep/BitFieldCoder.h>
Expand Down Expand Up @@ -40,16 +50,6 @@ namespace eicrecon {
>
>;

/** Topological Cell Clustering Algorithm.
*
* Topological Cell Clustering Algorithm for Imaging Calorimetry
* 1. group all the adjacent pixels
*
* Author: Chao Peng (ANL), 06/02/2021
* References: https://arxiv.org/pdf/1603.02934.pdf
*
* \ingroup reco
*/
class ImagingTopoCluster
: public ImagingTopoClusterAlgorithm,
public WithPodConfig<ImagingTopoClusterConfig> {
Expand All @@ -64,8 +64,13 @@ namespace eicrecon {
private:

// unitless counterparts of the input parameters
double localDistXY[2]{0, 0}, layerDistEtaPhi[2]{0, 0}, layerDistXY[2]{0, 0}, sectorDist{0};
double minClusterHitEdep{0}, minClusterCenterEdep{0}, minClusterEdep{0}, minClusterNhits{0};
std::array<double,2> localDistXY{0, 0};
std::array<double,2> layerDistEtaPhi{0, 0};
std::array<double,2> layerDistXY{0, 0};
double sectorDist{0};
double minClusterHitEdep{0};
double minClusterCenterEdep{0};
double minClusterEdep{0};

public:
void init() {
Expand All @@ -79,6 +84,10 @@ namespace eicrecon {
error( "Expected 2 values (eta_dist, phi_dist) for layerDistEtaPhi" );
return;
}
if (m_cfg.minClusterCenterEdep < m_cfg.minClusterHitEdep) {
error( "minClusterCenterEdep must be greater than or equal to minClusterHitEdep" );
return;
}

// using juggler internal units (GeV, dd4hep::mm, dd4hep::ns, dd4hep::rad)
localDistXY[0] = m_cfg.localDistXY[0] / dd4hep::mm;
Expand Down Expand Up @@ -124,21 +133,55 @@ namespace eicrecon {
const auto [hits] = input;
auto [proto] = output;

// group neighbouring hits
std::vector<bool> visits(hits->size(), false);
std::vector<std::set<std::size_t>> groups;
for (size_t i = 0; i < hits->size(); ++i) {
debug("hit {:d}: local position = ({}, {}, {}), global position = ({}, {}, {})", i + 1,
(*hits)[i].getLocal().x, (*hits)[i].getLocal().y, (*hits)[i].getPosition().z,
(*hits)[i].getPosition().x, (*hits)[i].getPosition().y, (*hits)[i].getPosition().z
// Sort hit indices (podio collections do not support std::sort)
auto compare = [&hits](const auto& a, const auto& b) {
// if !(a < b) and !(b < a), then a and b are equivalent
// and only one of them will be allowed in a set
if ((*hits)[a].getLayer() == (*hits)[b].getLayer()) {
return (*hits)[a].getObjectID().index < (*hits)[b].getObjectID().index;
}
return (*hits)[a].getLayer() < (*hits)[b].getLayer();
};
// indices contains the remaining hit indices that have not
// been assigned to a group yet
std::set<std::size_t, decltype(compare)> indices(compare);
// set does not have a size yet, so cannot fill with iota
for (std::size_t i = 0; i < hits->size(); ++i) {
indices.insert(i);
}
// ensure no hits were dropped due to equivalency in set
if (hits->size() != indices.size()) {
error("equivalent hits were dropped: #hits {:d}, #indices {:d}", hits->size(), indices.size());
}

// Group neighbouring hits
std::vector<std::list<std::size_t>> groups;
// because indices changes, the loop over indices requires some care:
// - we must use iterators instead of range-for
// - erase returns an incremented iterator and therefore acts as idx++
// - when the set becomes empty on erase, idx is invalid and idx++ will be too
// (also applies to loop in bfs_group below)
for (auto idx = indices.begin(); idx != indices.end();
indices.empty() ? idx = indices.end() : idx) {

debug("hit {:d}: local position = ({}, {}, {}), global position = ({}, {}, {}), energy = {}", *idx,
(*hits)[*idx].getLocal().x, (*hits)[*idx].getLocal().y, (*hits)[*idx].getPosition().z,
(*hits)[*idx].getPosition().x, (*hits)[*idx].getPosition().y, (*hits)[*idx].getPosition().z,
(*hits)[*idx].getEnergy()
);
// already in a group, or not energetic enough to form a cluster
if (visits[i] || (*hits)[i].getEnergy() < minClusterCenterEdep) {
continue;

// not energetic enough for cluster center, but could still be cluster hit
if ((*hits)[*idx].getEnergy() < minClusterCenterEdep) {
idx++;
continue;
}

// create a new group, and group all the neighbouring hits
groups.emplace_back();
bfs_group(*hits, groups.back(), i, visits);
groups.emplace_back(std::list{*idx});
bfs_group(*hits, indices, groups.back(), *idx);

// wait with erasing until after bfs_group to ensure iterator is not invalidated in bfs_group
idx = indices.erase(idx); // takes role of idx++
}
debug("found {} potential clusters (groups of hits)", groups.size());
for (size_t i = 0; i < groups.size(); ++i) {
Expand All @@ -147,7 +190,7 @@ namespace eicrecon {

// form clusters
for (const auto &group : groups) {
if (static_cast<int>(group.size()) < m_cfg.minClusterNhits) {
if (group.size() < m_cfg.minClusterNhits) {
continue;
}
double energy = 0.;
Expand Down Expand Up @@ -198,35 +241,44 @@ namespace eicrecon {
}

// grouping function with Breadth-First Search
void bfs_group(const edm4eic::CalorimeterHitCollection &hits, std::set<std::size_t> &group, std::size_t idx, std::vector<bool> &visits) const {
visits[idx] = true;
// note: template to allow Compare only known in local scope of caller
template<typename Compare>
void bfs_group(const edm4eic::CalorimeterHitCollection &hits, std::set<std::size_t,Compare>& indices, std::list<std::size_t> &group, const std::size_t idx) const {

// not a qualified hit to participate clustering, stop here
if (hits[idx].getEnergy() < m_cfg.minClusterHitEdep) {
return;
}
// loop over group as it grows, until the end is stable and we reach it
for (auto idx1 = group.begin(); idx1 != group.end(); ++idx1) {
// check neighbours (note comments on loop over set above)
for (auto idx2 = indices.begin(); idx2 != indices.end();
indices.empty() ? idx2 = indices.end() : idx2) {

group.insert(idx);
size_t prev_size = 0;
// skip idx1 and original idx
// (we cannot erase idx since it would invalidate iterator in calling scope)
if (*idx2 == *idx1 || *idx2 == idx) {
idx2++;
continue;
}

while (prev_size != group.size()) {
prev_size = group.size();
for (std::size_t idx1 : group) {
// check neighbours
for (std::size_t idx2 = 0; idx2 < hits.size(); ++idx2) {
// not a qualified hit to participate clustering, skip
if (hits[idx2].getEnergy() < m_cfg.minClusterHitEdep) {
continue;
}
if ((!visits[idx2])
&& is_neighbour(hits[idx1], hits[idx2])) {
group.insert(idx2);
visits[idx2] = true;
}
// skip rest of list of hits when we're past relevant layers
//if (hits[*idx2].getLayer() - hits[*idx1].getLayer() > m_cfg.neighbourLayersRange) {
// break;
//}

// not energetic enough for cluster hit
if (hits[*idx2].getEnergy() < m_cfg.minClusterHitEdep) {
idx2 = indices.erase(idx2);
continue;
}

if (is_neighbour(hits[*idx1], hits[*idx2])) {
group.push_back(*idx2);
idx2 = indices.erase(idx2); // takes role of idx2++
} else {
idx2++;
}
}
}
}

};

} // namespace eicrecon
2 changes: 1 addition & 1 deletion src/algorithms/calorimetry/ImagingTopoClusterConfig.h
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ namespace eicrecon {
// minimum cluster energy (to save this cluster)
double minClusterEdep = 0.5 * dd4hep::MeV;
// minimum number of hits (to save this cluster)
int minClusterNhits = 10;
std::size_t minClusterNhits = 10;

};

Expand Down
2 changes: 1 addition & 1 deletion src/factories/calorimetry/ImagingTopoCluster_factory.h
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ class ImagingTopoCluster_factory : public JOmniFactory<ImagingTopoCluster_factor
ParameterRef<double> m_mched {this, "minClusterHitEdep", config().minClusterHitEdep};
ParameterRef<double> m_mcced {this, "minClusterCenterEdep", config().minClusterCenterEdep};
ParameterRef<double> m_mced {this, "minClusterEdep", config().minClusterEdep};
ParameterRef<int> m_mcnh {this, "minClusterNhits", config().minClusterNhits};
ParameterRef<std::size_t> m_mcnh {this, "minClusterNhits", config().minClusterNhits};

Service<AlgorithmsInit_service> m_algorithmsInit {this};

Expand Down

0 comments on commit 180fc07

Please sign in to comment.