Stats calc tool #2628

TjarkMiener · 2024-10-28T09:01:47Z

This PR adds a generic stats-calculation tool utilizing the PixelStatisticsCalculator.

Related #2542

Since we should also support the processing of MCs, we might want to run the stats calc tool over multiple tels.

maxnoe · 2024-10-28T10:05:11Z

pyproject.toml

@@ -99,6 +99,7 @@ ctapipe-process = "ctapipe.tools.process:main"
 ctapipe-merge = "ctapipe.tools.merge:main"
 ctapipe-fileinfo = "ctapipe.tools.fileinfo:main"
 ctapipe-quickstart = "ctapipe.tools.quickstart:main"
+ctapipe-stats-calculation = "ctapipe.tools.stats_calculation:main"


I'd prefer a verb here, like the other tools. E.g. ctapipe-calculate-pixel-statistics

mexanick · 2024-10-28T10:00:22Z

src/ctapipe/resources/stats_calc_config.yaml

@@ -0,0 +1,37 @@
+StatisticsCalculatorTool:


Since we don't use yaml for anything apart from the configurations, I suggest to rename the configuration file to just tool_name.yaml, i.e. stripping _config.

mexanick · 2024-10-28T10:02:31Z

src/ctapipe/tools/stats_calculation.py

+        ),
+    ).tag(config=True)
+
+    dl1a_column_name = CaselessStrEnum(


Is DL1a/b is "official"? Also, I'd perhaps use generic input_column_name similar to the output one.

no, in ctapipe we use DL1_IMAGES and DL1_PARAMETERS to distinguish between things that are per-pixel vs. single quantities per event.

https://ctapipe.readthedocs.io/en/latest/api/ctapipe.io.DataLevel.html

I'd also not make this an enum. In the generic tool, users should be able to chose any column that has compatible shape. Just provide a clear error when the column is not found in the input file.

This could also be a list of columns, to compute on multiple at the same time.

Ok, I changed the column name and also polish the references to DL1a data by using pixel-wise image data which is more descriptive. ToolConfigurationError is raised once the column is not found. Having list of columns seems a little bit of an overkill here, which would just make the code more complex. Maybe the aggregation config could be shared between the columns, but especially the outlier detection will be different between the columns.

ToolConfigurationError is raised once the column is not found. Having list of columns seems a little bit of an overkill here, which would just make the code more complex. Maybe the aggregation config could be shared between the columns, but especially the outlier detection will be different between the columns.

I think the case where you only want to know about a single column is quite rare, you are usually interested in multiple. So having to read all data again to compute metrics on a new column seems very limiting and a loop over columns shouldn't make the code much more complex.

src/ctapipe/resources/stats_calc_config.yaml

src/ctapipe/tools/stats_calculation.py

src/ctapipe/resources/stats_calc_config.yaml

src/ctapipe/tools/stats_calculation.py

rename the tool and file name only keep dl1 table of the particular telescope into RAM added tests for tool config errors rename input col name adopt yaml syntax in example config for stats calculation

ctao-dpps-sonarqube · 2024-10-28T15:56:16Z

Analysis Details

0 Issues

0 Bugs
0 Vulnerabilities
0 Code Smells

Coverage and Duplications

88.70% Coverage (94.30% Estimated after merge)
0.00% Duplicated Code (0.70% Estimated after merge)

Project ID: cta-observatory_ctapipe_AY52EYhuvuGcMFidNyUs

View in SonarQube

kosack

Currently fails to read prod3 files (which have no EFFECTIVE focal length information). The tool current fails with a focal_length_choice exception, however it seems there is no way to set the focal length choioce since the TableLoader is not set up to be configrable.

src/ctapipe/tools/calculate_pixel_stats.py

kosack · 2024-11-06T10:47:10Z

src/ctapipe/tools/calculate_pixel_stats.py

+
+    def setup(self):
+        # Check that the input and output files are not the same
+        if self.input_url == self.output_path:


will also need to change all instances of self.input_url to be self.input_data.input_url

Therefore I'd need to init the TableLoarder before performing the check.

maxnoe · 2024-11-06T11:29:06Z

Currently fails to read prod5 files (which have no EFFECTIVE focal length information).

prod 5 files should have effective focal length

kosack · 2024-11-06T13:38:50Z

src/ctapipe/tools/calculate_pixel_stats.py

+        # Iterate over the telescope ids and calculate the statistics
+        for tel_id in self.tel_ids:
+            # Read the whole dl1 images for one particular telescope
+            dl1_table = self.input_data.read_telescope_events_by_id(


I get a crash here:

% ctapipe-calculate-pixel-statistics -i events.dl1.h5 -o stats.h5 dl1_table = self.input_data.read_telescope_events_by_id( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/kkosack/Projects/CTA/Working/ctapipe/src/ctapipe/io/tableloader.py", line 1089, in read_telescope_events_by_id tel_ids = self.subarray.get_tel_ids(telescopes) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/kkosack/Projects/CTA/Working/ctapipe/src/ctapipe/instrument/subarray.py", line 549, in get_tel_ids for telescope in telescopes: TypeError: 'numpy.int16' object is not iterable

Seems to be due to passing an integer instead of a list, which is what is required by read_telescope_events_by_id

Suggested change

dl1_table = self.input_data.read_telescope_events_by_id(

dl1_table = self.input_data.read_telescope_events_by_id(

telescopes = [tel_id,]

How does this work in the tests?

Good point! I do not know why the test pass here. I will include the change.

kosack · 2024-11-06T13:58:09Z

In the output, how can I tell what column was aggregated? It is always named "statistics" and there is no metadata in the group or tables that contain that information. Wouldn't it be better to name the group like monitoring/statistics/{input_column_name}? (i.e. maybe set the default of output_column_name to be the value of input_column_name? And also add the input_column namein the output table's metadata (table.meta['input_column_name']=input_column_name)

kosack

A more general comment: with very minor changes, this could be turned into ctapipe-calculate-stats, i.e. the ability to compute stats for any column, not just pixel-wise ones.

Expose TableLoader as a configurable component (needed anyhow, see above)
minor modifications to drop assumption on data shape in calculator.py.

I would expect e.g. to be able to do:

ctapipe-calculate-pixel-statistics -i events-prod5.DL1.h5  
    --StatisticsAggregator.chunk_size=100 
    --StatisticsCalculatorTool.input_column_name hillas_length 
    -o length.h5

and get the stats on the length parameter. This is perhaps outside the scope of this PR, but should be kept in mind. It also relates to @maxnoe's comment that we could change the API to accept a mapping of columns to Aggragators.

kosack

A common error is to have too small a chunk size, but this now results in a very ugly error and a full trace-back and exception, along with an UnclosedFileWarning (bug?)

The former (Unexpected exception) should e caught and raises as a ToolConfigurationError, so the user gets a nice message. And please explain in the message what parameters controls this, i.e. say Change --StatisticsAggregator.chunk_size to decrease this.
The latter (unclosed file) seems to be a bug to fix.

2024-11-06 15:14:43,361 ERROR [ctapipe.StatisticsCalculatorTool] (tool.run): Caught unexpected exception: The length of the provided table (853) is insufficient to meet the required statistics for a single chunk of size (2500).
2024-11-06 15:14:43,361 ERROR [ctapipe.StatisticsCalculatorTool] (tool.run): Caught unexpected exception: The length of the provided table (853) is insufficient to meet the required statistics for a single chunk of size (2500).
Traceback (most recent call last):
  File "/Users/kkosack/Projects/CTA/Working/ctapipe/src/ctapipe/core/tool.py", line 431, in run
    self.start()
  File "/Users/kkosack/Projects/CTA/Working/ctapipe/src/ctapipe/tools/calculate_pixel_stats.py", line 134, in start
    aggregated_stats = self.stats_calculator.first_pass(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kkosack/Projects/CTA/Working/ctapipe/src/ctapipe/monitoring/calculator.py", line 169, in first_pass
    aggregated_stats = aggregator(
                       ^^^^^^^^^^^
  File "/Users/kkosack/Projects/CTA/Working/ctapipe/src/ctapipe/monitoring/aggregator.py", line 86, in __call__
    raise ValueError(
ValueError: The length of the provided table (853) is insufficient to meet the required statistics for a single chunk of size (2500).
2024-11-06 15:14:43,377 INFO [ctapipe.StatisticsCalculatorTool] (tool.write_provenance): Output:
/Users/kkosack/miniconda3/envs/ctapipe-0.21/lib/python3.12/site-packages/tables/file.py:113: UnclosedFileWarning: Closing remaining open file: /Users/kkosack/Projects/CTA/PipeWork/v0.21.3/events-prod5.DL1.h5
  warnings.warn(UnclosedFileWarning(msg))

TjarkMiener · 2024-11-14T12:56:40Z

The former (Unexpected exception) should e caught and raises as a ToolConfigurationError, so the user gets a nice message. And please explain in the message what parameters controls this, i.e. say Change --StatisticsAggregator.chunk_size to decrease this.

This is solved now. I added also a test for it.

I do not know why it's raised this UnclosedFileWarning. Is this related to my changes or a more general warning?

src/ctapipe/tools/calculate_pixel_stats.py

TjarkMiener · 2024-11-14T13:25:56Z

In the output, how can I tell what column was aggregated? It is always named "statistics" and there is no metadata in the group or tables that contain that information. Wouldn't it be better to name the group like monitoring/statistics/{input_column_name}? (i.e. maybe set the default of output_column_name to be the value of input_column_name? And also add the input_column namein the output table's metadata (table.meta['input_column_name']=input_column_name)

This is a good point. I think we need to metadata here. However, in the current base functionality the input_column_name is either image, peak_time, or variance. So the input_url is also needed in the metadata to understand which quantities were actually aggregated. I'm adding this information to the metadata. The output_column_name is needed to configurable and organize the different output tables. (Maybe we should rename this config parameter to output_table_name to be more correct).
So in the end the user should put in the config e.g.:
TableLoader.input_url: "path/to/pedestal.dl1.h5"
input_column_name: "image"
output_table_name: "pedestal"

Would this sounds good to you?

--
Another point you raised about the tree schema. I think it should be monitoring/telescope/... since it is telescope-wise monitoring data.

renamed output_column_name to output_table_name

mexanick · 2024-11-14T17:02:16Z

src/ctapipe/tools/tests/test_calculate_pixel_stats.py

@@ -100,3 +100,18 @@ def test_tool_config_error(tmp_path, dl1_image_file):
            cwd=tmp_path,
            raises=True,
        )
+    # Check if ToolConfigurationError is raised
+    # when the chunk size is larger than the number of events in the input file
+    with pytest.raises(ToolConfigurationError):


the tool will raise ToolConfigurationError in a few situations. Please test all of them and use regexp matching to check whether a correct error message is displayed, e.g.

Suggested change

with pytest.raises(ToolConfigurationError):

with pytest.raises(ToolConfigurationError, match="Change --StatisticsAggregator.chunk_size")):

...

TjarkMiener added 4 commits October 25, 2024 17:52

added stats calc tool

8c95d3d

added example config for stats calc

69ae37f

allow to process multiple tels

6b8feff

Since we should also support the processing of MCs, we might want to run the stats calc tool over multiple tels.

added unit test for stats calc tool

35e5082

TjarkMiener added the calibration label Oct 28, 2024

TjarkMiener requested review from maxnoe, mexanick, kosack, FrancaCassol, Hckjs and ctoennis October 28, 2024 09:01

TjarkMiener self-assigned this Oct 28, 2024

add changelog

78e3fc5

This comment has been minimized.

Sign in to view

polish docs

234382e

This comment has been minimized.

Sign in to view

maxnoe reviewed Oct 28, 2024

View reviewed changes

mexanick reviewed Oct 28, 2024

View reviewed changes

maxnoe reviewed Oct 28, 2024

View reviewed changes

src/ctapipe/resources/stats_calc_config.yaml Outdated Show resolved Hide resolved

maxnoe reviewed Oct 28, 2024

View reviewed changes

src/ctapipe/tools/stats_calculation.py Outdated Show resolved Hide resolved

TjarkMiener added 2 commits October 28, 2024 14:44

include first round of comments

ec8785f

rename the tool and file name only keep dl1 table of the particular telescope into RAM added tests for tool config errors rename input col name adopt yaml syntax in example config for stats calculation

rename config file also in quickstart tool

13d725d

This comment has been minimized.

Sign in to view

TjarkMiener requested review from mexanick and maxnoe October 28, 2024 14:26

remove redundant , in stats calc example config

1e73fe2

mexanick previously approved these changes Oct 28, 2024

View reviewed changes

TjarkMiener mentioned this pull request Oct 29, 2024

Camera and pointing calibration API #2542

Open

8 tasks

kosack requested changes Nov 6, 2024

View reviewed changes

src/ctapipe/tools/calculate_pixel_stats.py Outdated Show resolved Hide resolved

src/ctapipe/tools/calculate_pixel_stats.py Outdated Show resolved Hide resolved

src/ctapipe/tools/calculate_pixel_stats.py Outdated Show resolved Hide resolved

kosack reviewed Nov 6, 2024

View reviewed changes

src/ctapipe/tools/calculate_pixel_stats.py Outdated Show resolved Hide resolved

kosack reviewed Nov 6, 2024

View reviewed changes

kosack requested changes Nov 6, 2024

View reviewed changes

use TableLoader for input handling

606fec5

TjarkMiener dismissed mexanick’s stale review via 606fec5 November 14, 2024 12:11

TjarkMiener added 3 commits November 14, 2024 13:17

fix changelog filename

b1e0fa8

add changelog file

eaf4fe0

add proper ToolConfigurationError if chunk size is too large

d52ed8f

Hckjs reviewed Nov 14, 2024

View reviewed changes

src/ctapipe/tools/calculate_pixel_stats.py Outdated Show resolved Hide resolved

TjarkMiener added 2 commits November 14, 2024 14:27

added metadata

c49ce0f

renamed output_column_name to output_table_name

polish error message

07587ad

TjarkMiener requested review from mexanick, kosack and Hckjs November 14, 2024 13:59

mexanick reviewed Nov 14, 2024

View reviewed changes

mexanick added this to the 0.23.0 milestone Nov 14, 2024

maxnoe modified the milestones: 0.23.0, 0.24.0 Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stats calc tool #2628

Stats calc tool #2628

TjarkMiener commented Oct 28, 2024 •

edited

Loading

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

maxnoe Oct 28, 2024

mexanick Oct 28, 2024

mexanick Oct 28, 2024

maxnoe Oct 28, 2024

maxnoe Oct 28, 2024 •

edited

Loading

maxnoe Oct 28, 2024

TjarkMiener Oct 28, 2024

maxnoe Nov 6, 2024

This comment has been minimized.

ctao-dpps-sonarqube bot commented Oct 28, 2024

kosack left a comment •

edited

Loading

kosack Nov 6, 2024 •

edited

Loading

TjarkMiener Nov 14, 2024

maxnoe Nov 14, 2024

maxnoe commented Nov 6, 2024

kosack Nov 6, 2024

TjarkMiener Nov 14, 2024

kosack commented Nov 6, 2024 •

edited

Loading

kosack left a comment •

edited

Loading

kosack left a comment •

edited

Loading

TjarkMiener commented Nov 14, 2024

TjarkMiener commented Nov 14, 2024

mexanick Nov 14, 2024

	dl1_table = self.input_data.read_telescope_events_by_id(
	dl1_table = self.input_data.read_telescope_events_by_id(
	telescopes = [tel_id,]

	with pytest.raises(ToolConfigurationError):
	with pytest.raises(ToolConfigurationError, match="Change --StatisticsAggregator.chunk_size")):
	...

Stats calc tool #2628

Are you sure you want to change the base?

Stats calc tool #2628

Conversation

TjarkMiener commented Oct 28, 2024 • edited Loading

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maxnoe Oct 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment has been minimized.

ctao-dpps-sonarqube bot commented Oct 28, 2024

Analysis Details

0 Issues

Coverage and Duplications

kosack left a comment • edited Loading

Choose a reason for hiding this comment

kosack Nov 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maxnoe commented Nov 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kosack commented Nov 6, 2024 • edited Loading

kosack left a comment • edited Loading

Choose a reason for hiding this comment

kosack left a comment • edited Loading

Choose a reason for hiding this comment

TjarkMiener commented Nov 14, 2024

TjarkMiener commented Nov 14, 2024

Choose a reason for hiding this comment

TjarkMiener commented Oct 28, 2024 •

edited

Loading

maxnoe Oct 28, 2024 •

edited

Loading

kosack left a comment •

edited

Loading

kosack Nov 6, 2024 •

edited

Loading

kosack commented Nov 6, 2024 •

edited

Loading

kosack left a comment •

edited

Loading

kosack left a comment •

edited

Loading