diff --git a/projects/agc-julia-rntuple.yml b/projects/agc-julia-rntuple.yml index fc433a1..7c582fb 100644 --- a/projects/agc-julia-rntuple.yml +++ b/projects/agc-julia-rntuple.yml @@ -25,8 +25,10 @@ program: - IRIS-HEP fellow shortdescription: Implement an analysis pipeline for the Analysis Grand Challenge (AGC) using [JuliaHEP](https://github.com/JuliaHEP/) ecosystem. description: > - The project's main goal is to implement AGC pipeline using Julia to demonstrate usability and as a test of performance. New utility packages can be expected especially for systematics handling and out-of-core orchestration. (built on existing packages such as `FHist.jl` and `Dagger.jl`) - At the same time, the project can explore using `RNTuple` instead of `TTree` for AGC data storage. As the interface is exactly transparent, this goal mainly requires data conversion unless performance bugs are spotted. This will be help inform transition at LHC experiments in near future (Run 4). + The project's main goal is to implement AGC pipeline using Julia to demonstrate usability and as a test of performance. New utility packages can be expected + especially for systematics handling and out-of-core orchestration. (built on existing packages such as `FHist.jl` and `Dagger.jl`) At the same time, the + project can explore using `RNTuple` instead of `TTree` for AGC data storage. As the interface is exactly transparent, this goal mainly requires data + conversion unless performance bugs are spotted. This will be help inform transition at LHC experiments in near future (Run 4). contacts: - name: Jerry Ling email: jling@g.harvard.edu diff --git a/projects/agc-physlite.yml b/projects/agc-physlite.yml index ea44b28..b75c642 100644 --- a/projects/agc-physlite.yml +++ b/projects/agc-physlite.yml @@ -22,13 +22,13 @@ program: - IRIS-HEP fellow shortdescription: Create an Analysis Grand Challenge implementation using ATLAS PHYSLITE data description: > - The IRIS-HEP Analysis Grand Challenge (AGC) is a realistic environment for investigating how high energy physics data analysis workflows scale to the demands of the High-Luminosity LHC (HL-LHC). - It captures relevant workflow aspects from data delivery to statistical inference. - The AGC has so far been based on publicly available Open Data from the CMS experiment. - The ATLAS collaboration aims to use a data format called PHYSLITE at the HL-LHC, which slightly differs from the data formats used so far within the AGC. - This project involves implementing the capability to analyze PHYSLITE ATLAS data within the AGC workflow and optimizing the related performance under large volumes of data. - In addition to this, the evaluation of systematic uncertainties for ATLAS with PHYSLITE is expected to differ in some aspects from what the AGC has considered thus far. - This project will also investigate workflows to integrate the evaluation of such sources of uncertainty within a Python-based implementation of an AGC analysis task. + The IRIS-HEP Analysis Grand Challenge (AGC) is a realistic environment for investigating how high energy physics data analysis workflows scale to the demands + of the High-Luminosity LHC (HL-LHC). It captures relevant workflow aspects from data delivery to statistical inference. The AGC has so far been based on + publicly available Open Data from the CMS experiment. The ATLAS collaboration aims to use a data format called PHYSLITE at the HL-LHC, which slightly differs + from the data formats used so far within the AGC. This project involves implementing the capability to analyze PHYSLITE ATLAS data within the AGC workflow and + optimizing the related performance under large volumes of data. In addition to this, the evaluation of systematic uncertainties for ATLAS with PHYSLITE is + expected to differ in some aspects from what the AGC has considered thus far. This project will also investigate workflows to integrate the evaluation of such + sources of uncertainty within a Python-based implementation of an AGC analysis task. contacts: - name: Matthew Feickert email: matthew.feickert@cern.ch diff --git a/projects/agc-rdf.yml b/projects/agc-rdf.yml index 5362ec6..611dbd7 100644 --- a/projects/agc-rdf.yml +++ b/projects/agc-rdf.yml @@ -26,10 +26,12 @@ program: - IRIS-HEP fellow shortdescription: Develop and test an analysis pipeline using ROOT's RDataFrame for the next iteration of the Analysis Grand Challenge description: > - The IRIS-HEP Analysis Grand Challenge (AGC) aims to develop examples of realistic, end-to-end high-energy physics analyses, as well as demonstrate the advantages of modern tools and technologies when applied to such tasks. - The next iteration of the AGC (v2) will put the capabilities of modern analysis interfaces such as Coffea and ROOT's RDataFrame under further test, for example by including more complex systematic variations and sophisticated machine learning techniques. - The project consists in the investigation and implementation of such new developments in the context of RDataFrame as well as their benchmarking on state-of-the-art analysis facilities. - The goal is to gain insights useful to guide the future design of both the analysis facilities and the applications that will be deployed on them. + The IRIS-HEP Analysis Grand Challenge (AGC) aims to develop examples of realistic, end-to-end high-energy physics analyses, as well as demonstrate the + advantages of modern tools and technologies when applied to such tasks. The next iteration of the AGC (v2) will put the capabilities of modern analysis + interfaces such as Coffea and ROOT's RDataFrame under further test, for example by including more complex systematic variations and sophisticated machine + learning techniques. The project consists in the investigation and implementation of such new developments in the context of RDataFrame as well as their + benchmarking on state-of-the-art analysis facilities. The goal is to gain insights useful to guide the future design of both the analysis facilities and the + applications that will be deployed on them. contacts: - name: Enrico Guiraud email: enrico.guiraud@cern.ch diff --git a/projects/agc-recast.yml b/projects/agc-recast.yml index 5fcc85d..b10414a 100644 --- a/projects/agc-recast.yml +++ b/projects/agc-recast.yml @@ -22,17 +22,11 @@ commitment: - Full time shortdescription: Implement the CMS open data AGC analysis with RECAST and REANA description: > - [RECAST](https://iris-hep.org/projects/recast.html) is a platform for systematic - interpretation of LHC searches. - It reuses preserved analysis workflows from the LHC experiments, which is now - possible with containerization and tools such as [REANA](http://reanahub.io). - A yet unrealized component of the IRIS-HEP [Analysis Grand Challenge](https://agc.readthedocs.io/) - (AGC) is reuse and reinterpretation of the analysis. - This project would aim to preserve the AGC CMS open data analysis and the - accompanying distributed infrastructure and implement a RECAST workflow allowing - REANA integration with the AGC. - A key challenge of the project is creating a preservation scheme for the associated - Kubernetes distributed infrastructure. + [RECAST](https://iris-hep.org/projects/recast.html) is a platform for systematic interpretation of LHC searches. It reuses preserved analysis workflows from + the LHC experiments, which is now possible with containerization and tools such as [REANA](http://reanahub.io). A yet unrealized component of the IRIS-HEP + [Analysis Grand Challenge](https://agc.readthedocs.io/) (AGC) is reuse and reinterpretation of the analysis. This project would aim to preserve the AGC CMS + open data analysis and the accompanying distributed infrastructure and implement a RECAST workflow allowing REANA integration with the AGC. A key challenge of + the project is creating a preservation scheme for the associated Kubernetes distributed infrastructure. contacts: - name: Kyle Cranmer email: kyle.cranmer@wisc.edu diff --git a/projects/cms-data-pop.yml b/projects/cms-data-pop.yml index 3966195..ffc5485 100644 --- a/projects/cms-data-pop.yml +++ b/projects/cms-data-pop.yml @@ -23,7 +23,10 @@ commitment: - Full time shortdescription: Predict data popularity to improve its availability for physics analysis description: > - The CMS data management team is responsible for distributing data among computing centers worldwide. Given the limited disk space at these sites, the team must dynamically manage the available data on disk. Whenever users attempt to access unavailable data, they are required to wait for the data to be retrieved from permanent tape storage. This delay impedes data analysis and hinders the scientific productivity of the collaboration. The objective of this project is to create a tool that utilizes machine learning algorithms to predict which data should be retained, based on current usage patterns. + The CMS data management team is responsible for distributing data among computing centers worldwide. Given the limited disk space at these sites, the team + must dynamically manage the available data on disk. Whenever users attempt to access unavailable data, they are required to wait for the data to be retrieved + from permanent tape storage. This delay impedes data analysis and hinders the scientific productivity of the collaboration. The objective of this project is + to create a tool that utilizes machine learning algorithms to predict which data should be retained, based on current usage patterns. contacts: - name: Dmytro Kovalskyi email: Dmytro.Kovalskyi@cern.ch diff --git a/projects/cms-monit-micro-services.yml b/projects/cms-monit-micro-services.yml index ea81ce9..af9cb37 100644 --- a/projects/cms-monit-micro-services.yml +++ b/projects/cms-monit-micro-services.yml @@ -26,7 +26,12 @@ program: - IRIS-HEP fellow shortdescription: To develop microservice architecture for CMS HTCondor Job Monitoring description: > - Current implementation of HTCondor Job Monitoring, internally known as Spider service, is a monolithic application which query HTCondor Schedds periodically. This implementation does not allow deployment in modern Kubernetes infrastructures with advantages like auto-scaling, resilience, self-healing, and so on. However, it can be separated into microservices responsible for “ClassAds calculation and conversion to JSON documents”, “transmitting results to ActiveMQ and OpenSearch without any duplicates” and “highly durable query management”. Such a microservice architecture will allow the use of appropriate languages like GoLang when it has advantages over Python. Moreover, intermediate monitoring pipelines can be integrated into this microservice architecture and it will drop the work-power needed for the services that produce monitoring outcomes using HTCondor Job Monitoring data + Current implementation of HTCondor Job Monitoring, internally known as Spider service, is a monolithic application which query HTCondor Schedds periodically. + This implementation does not allow deployment in modern Kubernetes infrastructures with advantages like auto-scaling, resilience, self-healing, and so on. + However, it can be separated into microservices responsible for “ClassAds calculation and conversion to JSON documents”, “transmitting results to ActiveMQ and + OpenSearch without any duplicates” and “highly durable query management”. Such a microservice architecture will allow the use of appropriate languages like + GoLang when it has advantages over Python. Moreover, intermediate monitoring pipelines can be integrated into this microservice architecture and it will drop + the work-power needed for the services that produce monitoring outcomes using HTCondor Job Monitoring data contacts: - name: Brij Kishor Jashal email: brij@cern.ch diff --git a/projects/cms-t0-test.yml b/projects/cms-t0-test.yml index b6893a8..2f67071 100644 --- a/projects/cms-t0-test.yml +++ b/projects/cms-t0-test.yml @@ -24,7 +24,10 @@ commitment: - Full time shortdescription: Improve functional testing before deployment of critical changes for CMS Tier-0 description: > - The CMS Tier-0 service is responsible for the prompt processing and distribution of the data collected by the CMS Experiment. Thorough testing of any code or configuration changes for the service is critical for timely data processing. The existing system has a Jenkins pipeline to execute a large-scale "replay" of the data processing using old data for the final functional testing before deployment of critical changes. The project is focusing on integration of unit tests and smaller functional tests in the integration pipeline to speed up testing and reduce resource utilization. + The CMS Tier-0 service is responsible for the prompt processing and distribution of the data collected by the CMS Experiment. Thorough testing of any code or + configuration changes for the service is critical for timely data processing. The existing system has a Jenkins pipeline to execute a large-scale "replay" of + the data processing using old data for the final functional testing before deployment of critical changes. The project is focusing on integration of unit + tests and smaller functional tests in the integration pipeline to speed up testing and reduce resource utilization. contacts: - name: Dmytro Kovalskyi email: Dmytro.Kovalskyi@cern.ch @@ -34,4 +37,4 @@ contacts: email: jan.eysermans@cern.ch mentees: - name: Mycola Kolomiiets - link: https://iris-hep.org/fellows/MycolaKolomiiets.html \ No newline at end of file + link: https://iris-hep.org/fellows/MycolaKolomiiets.html diff --git a/projects/diff-geant.yml b/projects/diff-geant.yml index fb765d5..28c64e5 100644 --- a/projects/diff-geant.yml +++ b/projects/diff-geant.yml @@ -24,9 +24,9 @@ program: - IRIS-HEP fellow shortdescription: Developing an automatic differentiation and initial parameters optimisation pipeline for the particle shower model. description: > - The goal of this project is to develop a differentiable simulation and optimization pipeline for Geant4. The narrow task of this - Fellowship project is to develop a trial automatic differentiation and backpropagation pipeline for the Markov-like stochastic - branching process that is modeling a particle shower spreading inside a detector material in three spatial dimensions. + The goal of this project is to develop a differentiable simulation and optimization pipeline for Geant4. The narrow task of this Fellowship project is to + develop a trial automatic differentiation and backpropagation pipeline for the Markov-like stochastic branching process that is modeling a particle shower + spreading inside a detector material in three spatial dimensions. contacts: - name: Lukas Heinrich email: Lukas.Heinrich@cern.ch diff --git a/projects/energy-cost-vre-coffea-casa.yml b/projects/energy-cost-vre-coffea-casa.yml index f3daabe..a34741d 100644 --- a/projects/energy-cost-vre-coffea-casa.yml +++ b/projects/energy-cost-vre-coffea-casa.yml @@ -22,19 +22,16 @@ commitment: - Full time shortdescription: Implementing energy consumption benchmarks on different analysis platforms and facilities description: > - Benchmarks for software energy consumption are starting to appear - (see e.g. the [SCI score](https://github.com/Green-Software-Foundation/software_carbon_intensity/blob/main/Software_Carbon_Intensity/Software_Carbon_Intensity_Specification.md#quantification-method)) - alongside more common performance benchmarks. - In this project, we will pilot the implementation of selected software energy consumption benchmarks - on two different facilities for user analysis: + Benchmarks for software energy consumption are starting to appear (see e.g. the [SCI + score](https://github.com/Green-Software-Foundation/software_carbon_intensity/blob/main/Software_Carbon_Intensity/Software_Carbon_Intensity_Specification.md#quantification-method)) + alongside more common performance benchmarks. In this project, we will pilot the implementation of selected software energy consumption benchmarks on two + different facilities for user analysis: * the [Virtual Research Environment](https://indico.jlab.org/event/459/contributions/11671/), a prototype analysis platform for the European Open Science Cloud. * [Coffea-casa](https://coffea-casa.readthedocs.io/), a prototype Analysis Facility (AF), which provides services for "low-latency columnar analysis." - We will then test them with simple user software pipelines. - The candidate will work in collaboration with another IRIS-HEP fellow - investigating energy consumption benchmarks for ML algorithms, - and alongside a team of students and interns working on the selection and implementation of the benchmarks. + We will then test them with simple user software pipelines. The candidate will work in collaboration with another IRIS-HEP fellow investigating energy + consumption benchmarks for ML algorithms, and alongside a team of students and interns working on the selection and implementation of the benchmarks. contacts: - name: Caterina Doglioni email: caterina.doglioni@cern.ch diff --git a/projects/gnn-tracking.yml b/projects/gnn-tracking.yml index bf2e6a2..f027b90 100644 --- a/projects/gnn-tracking.yml +++ b/projects/gnn-tracking.yml @@ -25,23 +25,25 @@ program: shortdescription: Reconstruct the trajectories of particle with graph neural networks description: | - In the GNN tracking project, we use [graph neural networks][gnn-wiki] (GNNs) to reconstruct trajectories ("tracks") of elementary particles traveling through a detector. + In the GNN tracking project, we use [graph neural networks][gnn-wiki] (GNNs) to reconstruct trajectories ("tracks") of elementary particles traveling through + a detector. This task is called ["tracking"][tracking-wiki] and is different from many other problems that involve trajectories: * there are several thousand particles that need to be tracked at once, * there is no time information (the particles travel too fast), * we do not observe a continuous trajectory but instead only around five points ("hits") along the way in different detector layers. - The task can be described as a combinatorically very challenging "connect-the-dots" problem, essentially turning a cloud of points (hits) in 3D space into a set of O(1000) trajectories. - Expressed differently, each hit (containing not much more than the x/y/z coordinate) must be assigned to the particle/track it belongs to. + The task can be described as a combinatorically very challenging "connect-the-dots" problem, essentially turning a cloud of points (hits) in 3D space into a + set of O(1000) trajectories. Expressed differently, each hit (containing not much more than the x/y/z coordinate) must be assigned to the particle/track it + belongs to. - A conceptually simple way to turn this problem into a machine learning task is to create a fully connected graph of all points and then train an edge classifier to reject any edge that doesn't connect points that belong to the same particle. - In this way, only the individual trajectories remain as components of the initial fully connected graph. - However, this strategy does not seem to lead to perfect results in practice. - The approach of this project uses this strategy only as the first step to arrive at "small" graphs. - It then projects all hits into a learned latent space with the model learning to place hits of the same particle close to each other, such that the hits belonging to the same particle form clusters. + A conceptually simple way to turn this problem into a machine learning task is to create a fully connected graph of all points and then train an edge + classifier to reject any edge that doesn't connect points that belong to the same particle. In this way, only the individual trajectories remain as components + of the initial fully connected graph. However, this strategy does not seem to lead to perfect results in practice. The approach of this project uses this + strategy only as the first step to arrive at "small" graphs. It then projects all hits into a learned latent space with the model learning to place hits of + the same particle close to each other, such that the hits belonging to the same particle form clusters. - The project code together with documentation and a reading list is available on [github][ghorganization] and uses [pytorch geometric][pyg]. - See also [our GSoC proposal for the same project][gsoc-proposal], which lists prerequisites and possible tasks. + The project code together with documentation and a reading list is available on [github][ghorganization] and uses [pytorch geometric][pyg]. See also [our GSoC + proposal for the same project][gsoc-proposal], which lists prerequisites and possible tasks. [ghorganization]: https://github.com/gnn-tracking diff --git a/projects/intelligent-conditions-cache.yml b/projects/intelligent-conditions-cache.yml index 580d4f8..15ec013 100644 --- a/projects/intelligent-conditions-cache.yml +++ b/projects/intelligent-conditions-cache.yml @@ -12,9 +12,12 @@ program: ["IRIS-HEP fellow"] location: ["In person"] commitment: ["Full time"] description: > - Conditions data refers to additional information collected in particle physics experiments beyond what is recorded directly by the detector. This data plays a critical role in many experiments, providing crucial context and calibration information for the recorded measurements. However, managing conditions data poses unique challenges, particularly due to the high access rates involved. - The High-Energy Physics Software Foundation (HSF) has developed an experiment-agnostic approach to address these challenges, which has already been successfully deployed for the sPHENIX experiment at the Brookhaven National Laboratory (BNL). - The project focuses on investigating the access patterns for this conditions database to gather insights that will enable the development of an optimized caching system in the future. Machine Learning may be used for pattern recognition. + Conditions data refers to additional information collected in particle physics experiments beyond what is recorded directly by the detector. This data plays a + critical role in many experiments, providing crucial context and calibration information for the recorded measurements. However, managing conditions data + poses unique challenges, particularly due to the high access rates involved. The High-Energy Physics Software Foundation (HSF) has developed an + experiment-agnostic approach to address these challenges, which has already been successfully deployed for the sPHENIX experiment at the Brookhaven National + Laboratory (BNL). The project focuses on investigating the access patterns for this conditions database to gather insights that will enable the development of + an optimized caching system in the future. Machine Learning may be used for pattern recognition. contacts: - name: Lino Gerlach email: lino.oscar.gerlach@cern.ch diff --git a/projects/lst-gnn-tracking.yml b/projects/lst-gnn-tracking.yml index dc7b9d2..56e4ddb 100644 --- a/projects/lst-gnn-tracking.yml +++ b/projects/lst-gnn-tracking.yml @@ -1,6 +1,6 @@ -#remove commented text (after "#") in your project yml, including this line.. -#See the project_metadata.yml file in this repository for expected responses to each attribute. If you need -#to add additional responses, please modify project_metadata.yml accordingly +# remove commented text (after "#") in your project yml, including this line.. +# See the project_metadata.yml file in this repository for expected responses to each attribute. If you need +# to add additional responses, please modify project_metadata.yml accordingly --- name: Augmenting Line-Segment Tracking with Graph Neural Network postdate: 2023-02-18 @@ -67,9 +67,11 @@ description: > Week 3: Creating example of running the Line Segment classification inference on C++ environment with TorchScript Week 4/5: Integrating the inference with LST’s CUDA code to run the inference on GNN Week 5: Validating the implementation in the LST framework - Week 6/7: Performing optimization of utilizing the GNN inferences to measure performance gain in the efficiency metric of LST framework (i.e. efficiency, fake rate, and duplicate rate) + Week 6/7: Performing optimization of utilizing the GNN inferences to measure performance gain in the efficiency metric of LST framework (i.e. efficiency, fake + rate, and duplicate rate) Week 8/9: Perform large scale hyperparameter optimization to find best resulting model architectecture - Week 10/11: Perform research and development of extending the ability to classify Triplets, and beyond, with the Line Graph transformation approach, which would enable “one-shot” inference + Week 10/11: Perform research and development of extending the ability to classify Triplets, and beyond, with the Line Graph transformation approach, which + would enable “one-shot” inference Week 12: Wrap up the project, document and summarize the findings to allow for next steps contacts: diff --git a/projects/net-ai-ml.yml b/projects/net-ai-ml.yml index 31bbf0a..b7c1534 100644 --- a/projects/net-ai-ml.yml +++ b/projects/net-ai-ml.yml @@ -24,13 +24,21 @@ program: - IRIS-HEP fellow shortdescription: Machine learning for network problem identification. description: > - Research and Education networks are critical for modern, distributed scientific infrastructures. Networks enable data and services to operate across data centers and across the world. + Research and Education networks are critical for modern, distributed scientific infrastructures. Networks enable data and services to operate across data + centers and across the world. - The IRIS-HEP/OSG-LHC team has members working on network measurement, analytics and pre-emptive problem identification and localization and would like to involve a student or students interested in data science, machine learning or analytics to participate in our work. The team has assembled a rich, unique dataset, consisting of network-specific metrics, statistics and other measurements which are collected by various tools and systems. In addition, we have developed simple functions to create alarms identifying some types of problems. + The IRIS-HEP/OSG-LHC team has members working on network measurement, analytics and pre-emptive problem identification and localization and would like to + involve a student or students interested in data science, machine learning or analytics to participate in our work. The team has assembled a rich, unique + dataset, consisting of network-specific metrics, statistics and other measurements which are collected by various tools and systems. In addition, we have + developed simple functions to create alarms identifying some types of problems. - Interested students would work with pre-prepared datasets, annotated via our existing alarms, to train one or more machine learning algorithms and then use the trained algorithms to process another dataset, comparing results with the sample alarm method. The task is to provide a more effective method of identifying certain types of network issues using machine learning so that such problems can be quickly resolved before they impact scientists who rely on these networks. + Interested students would work with pre-prepared datasets, annotated via our existing alarms, to train one or more machine learning algorithms and then use + the trained algorithms to process another dataset, comparing results with the sample alarm method. The task is to provide a more effective method of + identifying certain types of network issues using machine learning so that such problems can be quickly resolved before they impact scientists who rely on + these networks. - The student will be expected to participate in a weekly group meeting focused on network measurement, analytics, monitoring and alarming, which will provide a venue to discuss and learn about concepts, tools and methodologies relevant to the project. + The student will be expected to participate in a weekly group meeting focused on network measurement, analytics, monitoring and alarming, which will provide a + venue to discuss and learn about concepts, tools and methodologies relevant to the project. The project goal is to create improved user-facing alerting and alarming related to the research and education networks used by HEP, WLCG and OSG communities. diff --git a/projects/net-alarms.yml b/projects/net-alarms.yml index 6ec7970..359b0b7 100644 --- a/projects/net-alarms.yml +++ b/projects/net-alarms.yml @@ -24,15 +24,26 @@ program: - IRIS-HEP fellow shortdescription: Enabling advanced network problem detection for the science community. description: > - Research and Education networks are critical for modern, distributed scientific infrastructures. Networks enable data and services to operate across data centers and across the world. + Research and Education networks are critical for modern, distributed scientific infrastructures. Networks enable data and services to operate across data + centers and across the world. - The IRIS-HEP/OSG-LHC team has members working on network measurement, analytics and pre-emptive problem identification and localization and would like to involve a student or students interested in data science, monitoring or analytics to participate in our work. The team has assembled a rich, unique dataset, consisting of network-specific metrics, statistics and other measurements which are collected by various tools and systems. In addition, we have developed simple functions to create alarms identifying some types of problems. + The IRIS-HEP/OSG-LHC team has members working on network measurement, analytics and pre-emptive problem identification and localization and would like to + involve a student or students interested in data science, monitoring or analytics to participate in our work. The team has assembled a rich, unique dataset, + consisting of network-specific metrics, statistics and other measurements which are collected by various tools and systems. In addition, we have developed + simple functions to create alarms identifying some types of problems. - This project is intended to expand and augment the existing simple alarms with new alarms based upon the extensive data we are collecting. As we examine the data in depth, we realize there are important indicators of problems both in our networks as well as in our network monitoring infrastructure. Interested students would work with our data using tools like Elasticsearch, Kibana and Jupyter Notebooks to first understand the types of data being collected and then use that knowledge to create increasingly powerful alarms which clearly identify specific problems. The task is to maximize the diagnostic range and capability of our alarms to proactively identify problems before they impact scientists who rely on these networks or impact our network measurement infrastructure’s ability to gather data in the first place. + This project is intended to expand and augment the existing simple alarms with new alarms based upon the extensive data we are collecting. As we examine the + data in depth, we realize there are important indicators of problems both in our networks as well as in our network monitoring infrastructure. Interested + students would work with our data using tools like Elasticsearch, Kibana and Jupyter Notebooks to first understand the types of data being collected and then + use that knowledge to create increasingly powerful alarms which clearly identify specific problems. The task is to maximize the diagnostic range and + capability of our alarms to proactively identify problems before they impact scientists who rely on these networks or impact our network measurement + infrastructure’s ability to gather data in the first place. - The student will be expected to participate in a weekly group meeting focused on network measurement, analytics, monitoring and alarming, which will provide a venue to discuss and learn about concepts, tools and methodologies relevant to the project. + The student will be expected to participate in a weekly group meeting focused on network measurement, analytics, monitoring and alarming, which will provide a + venue to discuss and learn about concepts, tools and methodologies relevant to the project. - The project goal is to create improved alerting and alarming related to both the research and education networks used by HEP, WLCG and OSG communities and the infrastructure we have created to measure and monitor it. + The project goal is to create improved alerting and alarming related to both the research and education networks used by HEP, WLCG and OSG communities and the + infrastructure we have created to measure and monitor it. contacts: - name: Shawn McKee diff --git a/projects/pnr-cicd-automation.yml b/projects/pnr-cicd-automation.yml index dc6902d..c1869b8 100644 --- a/projects/pnr-cicd-automation.yml +++ b/projects/pnr-cicd-automation.yml @@ -22,7 +22,14 @@ program: - IRIS-HEP fellow shortdescription: Automate manual operations and implement CI/CD for CMS Production & Reprocessing group. description: > - The Production & Reprocessing (P&R) group is responsible for maintaining and operating the CMS central workload management system, which processes hundreds of physics workflows daily which produce the data that physicists use in their analyses. The requests which have a similar physics goal are grouped as ‘campaigns’. P&R manages the lifecycle of hundreds of campaigns, each with its unique needs. The objective of this project is to automate the checks that are performed manually on these campaigns. This involves creating a system to automatically set up, verify, and activate new campaigns, along with managing data storage and allocation. The second part of the project focuses on implementing a Continuous Integration and Continuous Deployment (CI/CD) pipeline for efficiently deploying and maintaining software services. This will include converting manual testing procedures into automated ones, improving overall efficiency and reducing errors. Tools such as Gitlab Pipelines for CI/CD, Python for scripting, and various automated testing frameworks will be employed. This initiative is designed to streamline operations, making them more efficient and effective. + The Production & Reprocessing (P&R) group is responsible for maintaining and operating the CMS central workload management system, which processes hundreds of + physics workflows daily which produce the data that physicists use in their analyses. The requests which have a similar physics goal are grouped as + ‘campaigns’. P&R manages the lifecycle of hundreds of campaigns, each with its unique needs. The objective of this project is to automate the checks that are + performed manually on these campaigns. This involves creating a system to automatically set up, verify, and activate new campaigns, along with managing data + storage and allocation. The second part of the project focuses on implementing a Continuous Integration and Continuous Deployment (CI/CD) pipeline for + efficiently deploying and maintaining software services. This will include converting manual testing procedures into automated ones, improving overall + efficiency and reducing errors. Tools such as Gitlab Pipelines for CI/CD, Python for scripting, and various automated testing frameworks will be employed. + This initiative is designed to streamline operations, making them more efficient and effective. contacts: - name: Hassan Ahmed email: m.hassan@cern.ch diff --git a/projects/podio-julia.yml b/projects/podio-julia.yml index 46f8713..79859aa 100644 --- a/projects/podio-julia.yml +++ b/projects/podio-julia.yml @@ -22,9 +22,9 @@ program: - IRIS-HEP fellow shortdescription: Add Julia Interface to the [PODIO](https://github.com/aidasoft/podio) library. description: > - The project's main goal is to add a proper Julia back-end to [PODIO](https://github.com/aidasoft/podio). - A previous Google Summer of Code [project](https://hepsoftwarefoundation.org/gsoc/blogs/2022/blog_PODIO_SoumilBaldota.html) worked on an early prototype, which showed the feasibility. - The aim is to complete the feature set, and do thorough (performance) testing afterwards. + The project's main goal is to add a proper Julia back-end to [PODIO](https://github.com/aidasoft/podio). A previous Google Summer of Code + [project](https://hepsoftwarefoundation.org/gsoc/blogs/2022/blog_PODIO_SoumilBaldota.html) worked on an early prototype, which showed the feasibility. The aim + is to complete the feature set, and do thorough (performance) testing afterwards. contacts: - name: Benedikt Hegner email: benedikt.hegner@cern.ch diff --git a/projects/podio-rust.yml b/projects/podio-rust.yml index 59c90a8..3ad1be2 100644 --- a/projects/podio-rust.yml +++ b/projects/podio-rust.yml @@ -23,9 +23,10 @@ program: - IRIS-HEP fellow shortdescription: Explore the addition of a Rust interface to the [PODIO](https://github.com/aidasoft/podio) library. description: > - The project's main goal is to prototype a Rust interface to the PODIO library. Data models in PODIO are declared with a simple programming-language agnostic syntax. In this project - we will explore how the PODIO data model concepts can be mapped best onto Rust. At the same time, we will investigate how much Rust's macro system can support the function of PODIO. - For the other supported languages Python, C++ and (experimentally) Julia PODIO generates all of the source code. With proper usage of Rust macros this code generation could be kept to a minimum. + The project's main goal is to prototype a Rust interface to the PODIO library. Data models in PODIO are declared with a simple programming-language agnostic + syntax. In this project we will explore how the PODIO data model concepts can be mapped best onto Rust. At the same time, we will investigate how much Rust's + macro system can support the function of PODIO. For the other supported languages Python, C++ and (experimentally) Julia PODIO generates all of the source + code. With proper usage of Rust macros this code generation could be kept to a minimum. contacts: - name: Benedikt Hegner email: benedikt.hegner@cern.ch diff --git a/projects/rucio-software-development.yml b/projects/rucio-software-development.yml index bcc2185..63e0bb9 100644 --- a/projects/rucio-software-development.yml +++ b/projects/rucio-software-development.yml @@ -24,11 +24,18 @@ program: - IRIS-HEP fellow shortdescription: Rucio core developments for large-scale data management description: > - The Rucio system is an open and community-driven data management system for data organisation, management, and access of scientific data. Several communities and experiments have adopted Rucio as a common solution, therefore we seek a dedicated software engineer to help implement much wished for features and extensions in the Rucio core. + The Rucio system is an open and community-driven data management system for data organisation, management, and access of scientific data. Several communities + and experiments have adopted Rucio as a common solution, therefore we seek a dedicated software engineer to help implement much wished for features and + extensions in the Rucio core. - The selected candidate will focus on producing software for several Rucio components. There is a multitude of potential topics, also based on the candidate's interests, that can be tackled. Examples include, but are not limited to (1) integrate static type checking capabilities into the framework as well as improve its runtime efficiency, (2) continue the documentation work for automatically generated API and REST interface documentation, (3) evolve the Rucio Upload and Download clients to new complex workflows suitable to modern analysis, (4) continue the development work on the new Rucio Web User Interface, and many more. + The selected candidate will focus on producing software for several Rucio components. There is a multitude of potential topics, also based on the candidate's + interests, that can be tackled. Examples include, but are not limited to (1) integrate static type checking capabilities into the framework as well as improve + its runtime efficiency, (2) continue the documentation work for automatically generated API and REST interface documentation, (3) evolve the Rucio Upload and + Download clients to new complex workflows suitable to modern analysis, (4) continue the development work on the new Rucio Web User Interface, and many more. - The selected candidate will participate in a large distributed team using best industry practices such code review, continuous integration, test-driven development, and blue-green deployments. It is important to us that the candidate bring their creativity to the team, therefore we encourage them to also help with developing and evaluating new ideas and designs. + The selected candidate will participate in a large distributed team using best industry practices such code review, continuous integration, test-driven + development, and blue-green deployments. It is important to us that the candidate bring their creativity to the team, therefore we encourage them to also help + with developing and evaluating new ideas and designs. contacts: - name: Mario Lassnig diff --git a/projects/snakemake-recast.yml b/projects/snakemake-recast.yml index 1e12f49..39a6def 100644 --- a/projects/snakemake-recast.yml +++ b/projects/snakemake-recast.yml @@ -42,4 +42,4 @@ contacts: email: lukas.heinrich@cern.ch mentees: - name: Andrii Povsten - link: https://iris-hep.org/fellows/AndriiPovsten.html \ No newline at end of file + link: https://iris-hep.org/fellows/AndriiPovsten.html diff --git a/projects/trigger-lines-for-llp.yml b/projects/trigger-lines-for-llp.yml index 91cf39a..d8c2963 100644 --- a/projects/trigger-lines-for-llp.yml +++ b/projects/trigger-lines-for-llp.yml @@ -25,10 +25,15 @@ program: - IRIS-HEP fellow shortdescription: Development of trigger lines to detect long-lived particles at LHCb description: > - The project focuses on developing trigger lines to detect long-lived particles by utilizing recently developed reconstruction algorithms within the Allen framework. These lines will employ the topologies of SciFi seeds originating from standard model long-lived particles, such as the strange Λ0. Multiple studies based on Monte Carlo (MC) simulations will be conducted to understand the variables of interest that can be used to select events. - The development of new trigger lines faces limitations due to output trigger rate throughput constraints of the entire HLT1 system. + The project focuses on developing trigger lines to detect long-lived particles by utilizing recently developed reconstruction algorithms within the Allen + framework. These lines will employ the topologies of SciFi seeds originating from standard model long-lived particles, such as the strange Λ0. Multiple + studies based on Monte Carlo (MC) simulations will be conducted to understand the variables of interest that can be used to select events. The development of + new trigger lines faces limitations due to output trigger rate throughput constraints of the entire HLT1 system. - An essential aspect of the project is the physics performance and the capability of the trigger lines for long-lived particle detection. The student will develop and employ ML/AI techniques that could significantly boost performance while maintaining control over execution time. The physics performance of the new trigger lines will be tested using specific decay channels from both Standard Model (SM) transitions and novel processes, such as dark bosons. Additionally, 2023 data from collisions collected during Run 3 will be used to commission these lines. + An essential aspect of the project is the physics performance and the capability of the trigger lines for long-lived particle detection. The student will + develop and employ ML/AI techniques that could significantly boost performance while maintaining control over execution time. The physics performance of the + new trigger lines will be tested using specific decay channels from both Standard Model (SM) transitions and novel processes, such as dark bosons. + Additionally, 2023 data from collisions collected during Run 3 will be used to commission these lines. contacts: - name: Arantza Oyanguren email: oyanguren@ific.uv.es