-
Notifications
You must be signed in to change notification settings - Fork 44
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #67 from iris-hep/update-IRIS-HEP-projects-0131
Update iris hep projects 0131
- Loading branch information
Showing
20 changed files
with
133 additions
and
81 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,8 +25,10 @@ program: | |
- IRIS-HEP fellow | ||
shortdescription: Implement an analysis pipeline for the Analysis Grand Challenge (AGC) using [JuliaHEP](https://github.com/JuliaHEP/) ecosystem. | ||
description: > | ||
The project's main goal is to implement AGC pipeline using Julia to demonstrate usability and as a test of performance. New utility packages can be expected especially for systematics handling and out-of-core orchestration. (built on existing packages such as `FHist.jl` and `Dagger.jl`) | ||
At the same time, the project can explore using `RNTuple` instead of `TTree` for AGC data storage. As the interface is exactly transparent, this goal mainly requires data conversion unless performance bugs are spotted. This will be help inform transition at LHC experiments in near future (Run 4). | ||
The project's main goal is to implement AGC pipeline using Julia to demonstrate usability and as a test of performance. New utility packages can be expected | ||
especially for systematics handling and out-of-core orchestration. (built on existing packages such as `FHist.jl` and `Dagger.jl`) At the same time, the | ||
project can explore using `RNTuple` instead of `TTree` for AGC data storage. As the interface is exactly transparent, this goal mainly requires data | ||
conversion unless performance bugs are spotted. This will be help inform transition at LHC experiments in near future (Run 4). | ||
contacts: | ||
- name: Jerry Ling | ||
email: [email protected] | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,13 +22,13 @@ program: | |
- IRIS-HEP fellow | ||
shortdescription: Create an Analysis Grand Challenge implementation using ATLAS PHYSLITE data | ||
description: > | ||
The IRIS-HEP Analysis Grand Challenge (AGC) is a realistic environment for investigating how high energy physics data analysis workflows scale to the demands of the High-Luminosity LHC (HL-LHC). | ||
It captures relevant workflow aspects from data delivery to statistical inference. | ||
The AGC has so far been based on publicly available Open Data from the CMS experiment. | ||
The ATLAS collaboration aims to use a data format called PHYSLITE at the HL-LHC, which slightly differs from the data formats used so far within the AGC. | ||
This project involves implementing the capability to analyze PHYSLITE ATLAS data within the AGC workflow and optimizing the related performance under large volumes of data. | ||
In addition to this, the evaluation of systematic uncertainties for ATLAS with PHYSLITE is expected to differ in some aspects from what the AGC has considered thus far. | ||
This project will also investigate workflows to integrate the evaluation of such sources of uncertainty within a Python-based implementation of an AGC analysis task. | ||
The IRIS-HEP Analysis Grand Challenge (AGC) is a realistic environment for investigating how high energy physics data analysis workflows scale to the demands | ||
of the High-Luminosity LHC (HL-LHC). It captures relevant workflow aspects from data delivery to statistical inference. The AGC has so far been based on | ||
publicly available Open Data from the CMS experiment. The ATLAS collaboration aims to use a data format called PHYSLITE at the HL-LHC, which slightly differs | ||
from the data formats used so far within the AGC. This project involves implementing the capability to analyze PHYSLITE ATLAS data within the AGC workflow and | ||
optimizing the related performance under large volumes of data. In addition to this, the evaluation of systematic uncertainties for ATLAS with PHYSLITE is | ||
expected to differ in some aspects from what the AGC has considered thus far. This project will also investigate workflows to integrate the evaluation of such | ||
sources of uncertainty within a Python-based implementation of an AGC analysis task. | ||
contacts: | ||
- name: Matthew Feickert | ||
email: [email protected] | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,10 +26,12 @@ program: | |
- IRIS-HEP fellow | ||
shortdescription: Develop and test an analysis pipeline using ROOT's RDataFrame for the next iteration of the Analysis Grand Challenge | ||
description: > | ||
The IRIS-HEP Analysis Grand Challenge (AGC) aims to develop examples of realistic, end-to-end high-energy physics analyses, as well as demonstrate the advantages of modern tools and technologies when applied to such tasks. | ||
The next iteration of the AGC (v2) will put the capabilities of modern analysis interfaces such as Coffea and ROOT's RDataFrame under further test, for example by including more complex systematic variations and sophisticated machine learning techniques. | ||
The project consists in the investigation and implementation of such new developments in the context of RDataFrame as well as their benchmarking on state-of-the-art analysis facilities. | ||
The goal is to gain insights useful to guide the future design of both the analysis facilities and the applications that will be deployed on them. | ||
The IRIS-HEP Analysis Grand Challenge (AGC) aims to develop examples of realistic, end-to-end high-energy physics analyses, as well as demonstrate the | ||
advantages of modern tools and technologies when applied to such tasks. The next iteration of the AGC (v2) will put the capabilities of modern analysis | ||
interfaces such as Coffea and ROOT's RDataFrame under further test, for example by including more complex systematic variations and sophisticated machine | ||
learning techniques. The project consists in the investigation and implementation of such new developments in the context of RDataFrame as well as their | ||
benchmarking on state-of-the-art analysis facilities. The goal is to gain insights useful to guide the future design of both the analysis facilities and the | ||
applications that will be deployed on them. | ||
contacts: | ||
- name: Enrico Guiraud | ||
email: [email protected] | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,17 +22,11 @@ commitment: | |
- Full time | ||
shortdescription: Implement the CMS open data AGC analysis with RECAST and REANA | ||
description: > | ||
[RECAST](https://iris-hep.org/projects/recast.html) is a platform for systematic | ||
interpretation of LHC searches. | ||
It reuses preserved analysis workflows from the LHC experiments, which is now | ||
possible with containerization and tools such as [REANA](http://reanahub.io). | ||
A yet unrealized component of the IRIS-HEP [Analysis Grand Challenge](https://agc.readthedocs.io/) | ||
(AGC) is reuse and reinterpretation of the analysis. | ||
This project would aim to preserve the AGC CMS open data analysis and the | ||
accompanying distributed infrastructure and implement a RECAST workflow allowing | ||
REANA integration with the AGC. | ||
A key challenge of the project is creating a preservation scheme for the associated | ||
Kubernetes distributed infrastructure. | ||
[RECAST](https://iris-hep.org/projects/recast.html) is a platform for systematic interpretation of LHC searches. It reuses preserved analysis workflows from | ||
the LHC experiments, which is now possible with containerization and tools such as [REANA](http://reanahub.io). A yet unrealized component of the IRIS-HEP | ||
[Analysis Grand Challenge](https://agc.readthedocs.io/) (AGC) is reuse and reinterpretation of the analysis. This project would aim to preserve the AGC CMS | ||
open data analysis and the accompanying distributed infrastructure and implement a RECAST workflow allowing REANA integration with the AGC. A key challenge of | ||
the project is creating a preservation scheme for the associated Kubernetes distributed infrastructure. | ||
contacts: | ||
- name: Kyle Cranmer | ||
email: [email protected] | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,7 +23,10 @@ commitment: | |
- Full time | ||
shortdescription: Predict data popularity to improve its availability for physics analysis | ||
description: > | ||
The CMS data management team is responsible for distributing data among computing centers worldwide. Given the limited disk space at these sites, the team must dynamically manage the available data on disk. Whenever users attempt to access unavailable data, they are required to wait for the data to be retrieved from permanent tape storage. This delay impedes data analysis and hinders the scientific productivity of the collaboration. The objective of this project is to create a tool that utilizes machine learning algorithms to predict which data should be retained, based on current usage patterns. | ||
The CMS data management team is responsible for distributing data among computing centers worldwide. Given the limited disk space at these sites, the team | ||
must dynamically manage the available data on disk. Whenever users attempt to access unavailable data, they are required to wait for the data to be retrieved | ||
from permanent tape storage. This delay impedes data analysis and hinders the scientific productivity of the collaboration. The objective of this project is | ||
to create a tool that utilizes machine learning algorithms to predict which data should be retained, based on current usage patterns. | ||
contacts: | ||
- name: Dmytro Kovalskyi | ||
email: [email protected] | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,7 +26,12 @@ program: | |
- IRIS-HEP fellow | ||
shortdescription: To develop microservice architecture for CMS HTCondor Job Monitoring | ||
description: > | ||
Current implementation of HTCondor Job Monitoring, internally known as Spider service, is a monolithic application which query HTCondor Schedds periodically. This implementation does not allow deployment in modern Kubernetes infrastructures with advantages like auto-scaling, resilience, self-healing, and so on. However, it can be separated into microservices responsible for “ClassAds calculation and conversion to JSON documents”, “transmitting results to ActiveMQ and OpenSearch without any duplicates” and “highly durable query management”. Such a microservice architecture will allow the use of appropriate languages like GoLang when it has advantages over Python. Moreover, intermediate monitoring pipelines can be integrated into this microservice architecture and it will drop the work-power needed for the services that produce monitoring outcomes using HTCondor Job Monitoring data | ||
Current implementation of HTCondor Job Monitoring, internally known as Spider service, is a monolithic application which query HTCondor Schedds periodically. | ||
This implementation does not allow deployment in modern Kubernetes infrastructures with advantages like auto-scaling, resilience, self-healing, and so on. | ||
However, it can be separated into microservices responsible for “ClassAds calculation and conversion to JSON documents”, “transmitting results to ActiveMQ and | ||
OpenSearch without any duplicates” and “highly durable query management”. Such a microservice architecture will allow the use of appropriate languages like | ||
GoLang when it has advantages over Python. Moreover, intermediate monitoring pipelines can be integrated into this microservice architecture and it will drop | ||
the work-power needed for the services that produce monitoring outcomes using HTCondor Job Monitoring data | ||
contacts: | ||
- name: Brij Kishor Jashal | ||
email: [email protected] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,7 +24,10 @@ commitment: | |
- Full time | ||
shortdescription: Improve functional testing before deployment of critical changes for CMS Tier-0 | ||
description: > | ||
The CMS Tier-0 service is responsible for the prompt processing and distribution of the data collected by the CMS Experiment. Thorough testing of any code or configuration changes for the service is critical for timely data processing. The existing system has a Jenkins pipeline to execute a large-scale "replay" of the data processing using old data for the final functional testing before deployment of critical changes. The project is focusing on integration of unit tests and smaller functional tests in the integration pipeline to speed up testing and reduce resource utilization. | ||
The CMS Tier-0 service is responsible for the prompt processing and distribution of the data collected by the CMS Experiment. Thorough testing of any code or | ||
configuration changes for the service is critical for timely data processing. The existing system has a Jenkins pipeline to execute a large-scale "replay" of | ||
the data processing using old data for the final functional testing before deployment of critical changes. The project is focusing on integration of unit | ||
tests and smaller functional tests in the integration pipeline to speed up testing and reduce resource utilization. | ||
contacts: | ||
- name: Dmytro Kovalskyi | ||
email: [email protected] | ||
|
@@ -34,4 +37,4 @@ contacts: | |
email: [email protected] | ||
mentees: | ||
- name: Mycola Kolomiiets | ||
link: https://iris-hep.org/fellows/MycolaKolomiiets.html | ||
link: https://iris-hep.org/fellows/MycolaKolomiiets.html |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,9 +24,9 @@ program: | |
- IRIS-HEP fellow | ||
shortdescription: Developing an automatic differentiation and initial parameters optimisation pipeline for the particle shower model. | ||
description: > | ||
The goal of this project is to develop a differentiable simulation and optimization pipeline for Geant4. The narrow task of this | ||
Fellowship project is to develop a trial automatic differentiation and backpropagation pipeline for the Markov-like stochastic | ||
branching process that is modeling a particle shower spreading inside a detector material in three spatial dimensions. | ||
The goal of this project is to develop a differentiable simulation and optimization pipeline for Geant4. The narrow task of this Fellowship project is to | ||
develop a trial automatic differentiation and backpropagation pipeline for the Markov-like stochastic branching process that is modeling a particle shower | ||
spreading inside a detector material in three spatial dimensions. | ||
contacts: | ||
- name: Lukas Heinrich | ||
email: [email protected] | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,19 +22,16 @@ commitment: | |
- Full time | ||
shortdescription: Implementing energy consumption benchmarks on different analysis platforms and facilities | ||
description: > | ||
Benchmarks for software energy consumption are starting to appear | ||
(see e.g. the [SCI score](https://github.com/Green-Software-Foundation/software_carbon_intensity/blob/main/Software_Carbon_Intensity/Software_Carbon_Intensity_Specification.md#quantification-method)) | ||
alongside more common performance benchmarks. | ||
In this project, we will pilot the implementation of selected software energy consumption benchmarks | ||
on two different facilities for user analysis: | ||
Benchmarks for software energy consumption are starting to appear (see e.g. the [SCI | ||
score](https://github.com/Green-Software-Foundation/software_carbon_intensity/blob/main/Software_Carbon_Intensity/Software_Carbon_Intensity_Specification.md#quantification-method)) | ||
alongside more common performance benchmarks. In this project, we will pilot the implementation of selected software energy consumption benchmarks on two | ||
different facilities for user analysis: | ||
* the [Virtual Research Environment](https://indico.jlab.org/event/459/contributions/11671/), | ||
a prototype analysis platform for the European Open Science Cloud. | ||
* [Coffea-casa](https://coffea-casa.readthedocs.io/), a prototype Analysis | ||
Facility (AF), which provides services for "low-latency columnar analysis." | ||
We will then test them with simple user software pipelines. | ||
The candidate will work in collaboration with another IRIS-HEP fellow | ||
investigating energy consumption benchmarks for ML algorithms, | ||
and alongside a team of students and interns working on the selection and implementation of the benchmarks. | ||
We will then test them with simple user software pipelines. The candidate will work in collaboration with another IRIS-HEP fellow investigating energy | ||
consumption benchmarks for ML algorithms, and alongside a team of students and interns working on the selection and implementation of the benchmarks. | ||
contacts: | ||
- name: Caterina Doglioni | ||
email: [email protected] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.