diff --git a/.gitignore b/.gitignore index bd162bd4f..260d72d9f 100644 --- a/.gitignore +++ b/.gitignore @@ -63,4 +63,5 @@ cmake-build*/ .cache compile_commands.json - +# Drawio backups +*.bkp diff --git a/docs/_media/algo_flow.drawio b/docs/_media/algo_flow.drawio new file mode 100644 index 000000000..b025fbca6 --- /dev/null +++ b/docs/_media/algo_flow.drawio @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/docs/_media/algo_flow.png b/docs/_media/algo_flow.png new file mode 100644 index 000000000..518e0f2ce Binary files /dev/null and b/docs/_media/algo_flow.png differ diff --git a/docs/_media/algo_flow2.png b/docs/_media/algo_flow2.png new file mode 100644 index 000000000..518e0f2ce Binary files /dev/null and b/docs/_media/algo_flow2.png differ diff --git a/docs/_media/algo_flow_01.drawio b/docs/_media/algo_flow_01.drawio new file mode 100644 index 000000000..0340474ab --- /dev/null +++ b/docs/_media/algo_flow_01.drawio @@ -0,0 +1,85 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/_media/algo_flow_01.png b/docs/_media/algo_flow_01.png new file mode 100644 index 000000000..503dc6214 Binary files /dev/null and b/docs/_media/algo_flow_01.png differ diff --git a/docs/_media/algo_flow_01.svg b/docs/_media/algo_flow_01.svg new file mode 100644 index 000000000..8f9a69403 --- /dev/null +++ b/docs/_media/algo_flow_01.svg @@ -0,0 +1,4 @@ + + + +
RawHits
RawHits
Hit Processing Algorithm
Hit Processing Algor...
HitClusters
HitClusters
Track Finding
Algorithm
Track Finding...
Candidates
Candidates
Track Fitting
Algorithm
Track Fitting...
Tracks
Tracks
Text is not SVG - cannot display
\ No newline at end of file diff --git a/docs/_media/algo_flow_02.drawio b/docs/_media/algo_flow_02.drawio new file mode 100644 index 000000000..8fb54ded6 --- /dev/null +++ b/docs/_media/algo_flow_02.drawio @@ -0,0 +1,177 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/_media/algo_flow_02.svg b/docs/_media/algo_flow_02.svg new file mode 100644 index 000000000..f7337427f --- /dev/null +++ b/docs/_media/algo_flow_02.svg @@ -0,0 +1,4 @@ + + + +
RawHits
Hit Processing Algorithm
HitClusters
Track Finding
Algorithm
Candidates
Track Fitting
Algorithm
Tracks
Event 
Source
Output
Calibrations
Geometry
Magnetic Field
- Dependency
- Input Output
Arrows Legend:
- Per event Data Flow
\ No newline at end of file diff --git a/docs/_media/algo_flow_03.drawio b/docs/_media/algo_flow_03.drawio new file mode 100644 index 000000000..174539d6f --- /dev/null +++ b/docs/_media/algo_flow_03.drawio @@ -0,0 +1,146 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/_media/algo_flow_03.svg b/docs/_media/algo_flow_03.svg new file mode 100644 index 000000000..acc5fcc0c --- /dev/null +++ b/docs/_media/algo_flow_03.svg @@ -0,0 +1,4 @@ + + + +
JFactory
JFactory
JFactory
JEventSource
JEventProcessor
JService
JService
JService
\ No newline at end of file diff --git a/docs/_media/arrows-queue.drawio b/docs/_media/arrows-queue.drawio new file mode 100644 index 000000000..e69de29bb diff --git a/docs/_media/arrows-queue.svg b/docs/_media/arrows-queue.svg new file mode 100644 index 000000000..72d1bc3af --- /dev/null +++ b/docs/_media/arrows-queue.svg @@ -0,0 +1,4 @@ + + + +
queue A  
queue B 
sequential 
arrow
parallel 
arrow
sequential 
arrow
\ No newline at end of file diff --git a/docs/_media/concepts-factory-diagram.png b/docs/_media/concepts-factory-diagram.png new file mode 100644 index 000000000..f6cc287dd Binary files /dev/null and b/docs/_media/concepts-factory-diagram.png differ diff --git a/docs/_media/data-identification.drawio b/docs/_media/data-identification.drawio new file mode 100644 index 000000000..e69de29bb diff --git a/docs/_media/jana-flow.drawio b/docs/_media/jana-flow.drawio new file mode 100644 index 000000000..e69de29bb diff --git a/docs/_media/jana-flow.svg b/docs/_media/jana-flow.svg new file mode 100644 index 000000000..20993ad13 --- /dev/null +++ b/docs/_media/jana-flow.svg @@ -0,0 +1,4 @@ + + + +
ALGORITHMS
ALGORITHMS
from DAQ stream
DAQ
from files
C++ objects
(low level)
C++ objects
(refined)
ALGORITHMS
output
files
save to

JANA2

files
files
files
\ No newline at end of file diff --git a/docs/_media/jana2-diagram.pdn b/docs/_media/jana2-diagram.pdn new file mode 100644 index 000000000..f481b7e77 Binary files /dev/null and b/docs/_media/jana2-diagram.pdn differ diff --git a/docs/_media/jana2-diagram.png b/docs/_media/jana2-diagram.png new file mode 100644 index 000000000..7b5fd5c4a Binary files /dev/null and b/docs/_media/jana2-diagram.png differ diff --git a/docs/_media/old-schema.png b/docs/_media/old-schema.png index 526cb7db3..744ca5091 100644 Binary files a/docs/_media/old-schema.png and b/docs/_media/old-schema.png differ diff --git a/docs/_media/threading-schema.png b/docs/_media/threading-schema.png new file mode 100644 index 000000000..68a2b3590 Binary files /dev/null and b/docs/_media/threading-schema.png differ diff --git a/docs/concepts.md b/docs/concepts.md index f56992b69..5b461c6d6 100644 --- a/docs/concepts.md +++ b/docs/concepts.md @@ -1,133 +1,489 @@ -JANA2 Concepts -============== +# JANA2 Concepts -This section provides higher-level background and context for JANA, and discusses JANA's design philosophy and the -associated tradeoffs. -## JANA concepts +## Core Architecture -- JObjects are data containers for specific resuts, e.g. clusters or tracks. They may be plain-old structs or they may -optionally inherit from (e.g.) ROOT or NumPy datatypes. +![JANA diagram](_media/jana-flow.svg) -- JEventSources take a file or messaging producer which provides raw event data, and exposes it to JANA as a stream. -- JFactories calculate a specific result on an event-by-event basis. Their inputs may come from an EventSource or may -be computed via other JFactories. All results are evaluated lazily and cached until the entire event is finished processing. -in order to do so. Importantly, JFactories are decoupled from one another via the JEvent interface. It should make no -difference to the JFactory where its input data came from, as long as it has the correct type and tag. While the [Factory -Pattern](https://en.wikipedia.org/wiki/Factory_method_pattern) usually abstracts away the _subtype_ of the class being -created, in our case it abstracts away the _number of instances_ created instead. For instance, a ClusterFactory may -take m Hit objects and produce n Cluster objects, where m and n vary per event and won't be known until that - event gets processed. +At its core, JANA2 views data processing as a chain of transformations, +where algorithms are applied to data to produce more refined data. +This process is organized into two main layers: -- JEventProcessors run desired JFactories over the event stream and write the results to an output file or messaging -consumer. JFactories form a lazy directed acyclic graph, whereas JEventProcessors trigger their actual evaluation. +1. **Queue-Arrow Mechanism:** JANA2 utilizes the [arrow model](https://en.wikipedia.org/wiki/Arrow_\(computer_science\)), + where data starts in a queue. An "arrow" pulls data from the queue, processes it with algorithms, + and places the processed data into another queue. The simplest setup involves input and output queues + with a single arrow handling all necessary algorithms. But JANA2 supports more complex configurations + with multiple queues and arrows chained together, operating sequentially or in parallel as needed. -## Object lifecycles + ![Queue-Arrow mechanism](_media/arrows-queue.svg) -It is important to understand who owns each JObject and when it is destroyed. +2. **Algorithm Management within Arrows:** Within each arrow, JANA2 organizes and manages algorithms along with their + inputs and outputs, allowing flexibility in data processing. Arrows can be configured to distribute the processing + load across various algorithms. By assigning threads to arrows, JANA2 leverages modern hardware to process data + concurrently across multiple cores and processors, enhancing scalability and efficiency. -By default, a JFactory owns all of the JObjects that it created during `Process()`. Once all event processors have -finished processing a `JEvent`, all `JFactories` associated with that `JEvent` will clears and delete their `JObjects`. -However, you can change this behavior by setting one of the factory flags: +In organizing, managing, and building the codebase, JANA2 provides: -* `PERSISTENT`: Objects are neither cleared nor deleted. This is usually used for calibrations and translation tables. - Note that if an object is persistent, `JFactory::Process` will _not_ be re-run on the next `JEvent`. The user - may still update the objects manually, via `JFactory::BeginRun`, and must delete the objects manually via - `JFactory::EndRun` or `JFactory::Finish`. - -* `NOT_OBJECT_OWNER`: Objects are cleared from the `JFactory` but _not_ deleted. This is useful for "proxy" factories - (which reorganize objects that are owned by a different factory) and for `JEventGroups`. `JFactory::Process` _will_ be - re-run for each `JEvent`. As long as the objects are owned by a different `JFactory`, the user doesn't have to do any - cleanup. - -The lifetime of a `JFactory` spans the time that a `JEvent` is in-flight. No other guarantees are made: `JFactories` might -be re-used for multiple `JEvents` for the sake of efficiency, but the implementation is free to _not_ do so. In particular, -the user must never assume that one `JFactory` will see the entire `JEvent` stream. +- **Algorithm Building Blocks:** Essential components like Factories, Processors, Services and others, + help write, organize and manage algorithms. These modular units can be configured and combined to construct + the desired data processing pipelines, promoting flexibility and scalability. -The lifetime of a `JEventSource` spans the time that all of its emitted `JEvents` are in-flight. +- **Plugin Mechanism:** Orthogonal to the above, JANA2 offers a plugin mechanism to enhance modularity and flexibility. + Plugins are dynamic libraries with a specialized interface, enabling them to register components with the main application. + This allows for dynamic runtime configuration, selecting or replacing algorithms and components without recompilation, + and better code organization and reuse. Large applications are typically built from multiple plugins, + each responsible for specific processing aspects. Alternatively, monolithic applications without plugins + can be created for simpler, smaller applications. -The lifetime of a `JEventProcessor` spans the time that any `JEventSources` are active. -The lifetime of a `JService` not only spans the time that any `JEventProcessors` are active, but also the lifetime of -`JApplication` itself. Furthermore, because JServices use `shared_ptr`, they are allowed to live even longer than -`JApplication`, which is helpful for things like writing test cases. +## Building blocks + +The data analysis application flow can be viewed as a chain of algorithms that transform input data into the +desired output. A simplified example of such a chain is shown in the diagram below: + +![Simple Algorithms Flow](_media/algo_flow_01.svg) + +In this example, for each event, raw ADC values of hits are processed: +first combined into clusters, then passed into track-finding and fitting algorithms, +with the resulting tracks as the chain's output. In real-world scenarios, +the actual graph is significantly more complex and requires additional components such as Geometry, +magnetic field maps, calibrations, alignments, etc. +Additionally, some algorithms are responsible not only for processing objects in memory +but also for tasks such as reading data from disk or DAQ streams +and writing reconstructed data to a destination. +A more realistic and complex flow can be represented as follows: + + +![Simple Algorithms Flow](_media/algo_flow_02.svg) + +To give very brief overview algorithm building blocks, how this flow is organized in JANA2 : + +- **JFactory** - This is the primary component for implementing algorithms (depicted as orange boxes). + JFactories compute specific results on an event-by-event basis. + Their inputs may come from an EventSource or other JFactories. + Algorithms in JFactories can be implemented using either Declarative or Imperative approaches + (described later in the documentation). + +- **JEventSource** - A special type of algorithm responsible for acquiring raw event data, + and exposes it to JANA for subsequent processing. For example reading events from a file or listening + to DAQ messaging producer which provides raw event data. + +- **JEventProcessor** - Positioned at the top of the calculation chain, JEventProcessor is designed + to collect data from JFactories and handle end-point processing tasks, such as writing results to + an output file or messaging consumer. However, JEventProcessor is not limited to I/O operations; + it can also perform tasks like histogram plotting, data quality monitoring, and other forms of analysis. + + To clarify the distinction: JFactories form a lazy directed acyclic graph (DAG), + where each factory defines a specific step in the data processing chain. + In contrast, the JEventProcessor algorithm is executed for each event. + When the JEventProcessor collects data, it triggers the lazy evaluation of the required factories, + initiating the corresponding steps in the data processing chain. + +- **JService** - Used to store resources that remain constant across events, such as Geometry descriptions, + Magnetic Field Maps, and other shared data. Services are accessible by both algorithms and other services. + + +We now may redraw the above diagram in terms of JANA2 building blocks: + +![Simple Algorithms Flow](_media/algo_flow_03.svg) + + +## Data model + +JANA2 alows users to define and select their own event models, +providing the flexibility to design data structures to specific experimental needs. Taking the above +diagram as an example, classes such as `RawHits`, `HitClusters`, ... `Tracks` might be just a user defined classes. +The data structures can be as simple as: +```cpp +struct GenericHit { +double x,y,z, edep; +}; +``` -## Design philosophy +A key feature of JANA2 is that it doesn't require data being passed around +to inherit from any specific base class, such as JObject (used in JANA1) or ROOT's TObject. +While your data classes can inherit from other classes if your data model requires it, +JANA2 remains agnostic about this. -JANA's design philosophy can be boiled down to five values, ordered by importance: +JANA2 offers extended support for PODIO (Plain Old Data Input/Output) to facilitate standardized data handling, +it does not mandate the use of PODIO or even ROOT. This ensures that users can choose the most suitable data management +tools for their projects without being constrained by the framework. -### Simple to use +### Data Identification in JANA2 -JANA chooses its battles carefully. First and foremost, JANA is about parallelizing computations over data organized -into events. From a 30000-foot view, it should look more like OpenMP or Thread Building Blocks or RaftLib than like ROOT. -Unlike the aforementioned, JANA's vocabulary of abstractions is designed around the needs of physicists rather than -general programmers. However, JANA does not attempt to meet _all_ of the needs of physicists. +![Simple Algorithms Flow](_media/data-identification.svg) -JANA recognizes when coinciding concerns ought to be handled orthogonally. A good example is persistence. JANA does not -seek to provide its own persistence layer, nor does it require the user to commit to a specific dependency such as ROOT -or Numpy or Apache Arrow. Instead, JANA tries to make it feasible for the user to choose their persistence layer independently. -This way, if a collaboration decides they wish to (for instance) migrate from ROOT to Arrow, they have a well-defined migration -path which keeps the core analysis code largely intact. +An important aspect is how data is identified within JANA2. JANA2 supports two identifiers: -In particular, this means minimizing the complexity of the build system and minimizing orchestration. Building code -against JANA should require nothing more than implementing certain key interfaces, adding a single path to includes, -and linking against a single library. +1. **Data Type**: The C++ type of the data, e.g., `GenericHit` from the above example. +2. **Tags**: A string identifier in addition to type. -### Well-organized +The concept of tags is useful in several scenarios. For instance: +- When multiple factories can produce the same type of data e.g. utilizing different underlying algorithms. + By specifying the tag name, you can select which algorithm's output you want. +- To reuse the same type. E.g. You might have `GenericHit` data with tags + `"VertexTracker"` and `"BarrelTracker"` to distinguish between hits from different detectors. Or + type `Particle` with tags `"TrueMcParticles"` and `"ReconstructedParticles"` -While JANA's primary goal is running code in parallel, its secondary goal is imposing an organizing principle on -the users' codebase. This can be invaluable in a large collaboration where members vary in programming skill. Specifically, -JANA organizes processing logic into decoupled units. JFactories are agnostic of how and when their prerequisites are -computed, are only run when actually needed, and cache their results for reuse. Different analyses can coexist in separate -JEventProcessors. Components can be compiled into independent plugins, to be mixed and matched at runtime. All together, -JANA enforces an organizing principle that enables groups to develop and test their code with both freedom and discipline. +Depending on your data model and the types of factories used (described below), +you can choose different strategies for data identification: +- **Type-Based Identification**: Fully identify data only by its type name, keeping the tag empty most of the time. + Use tags only to identify alternative algorithms. This approach is used by GlueX. +- **Tag-Based Identification**: Use tags as the main data identifier and deduce types automatically whenever possible. + This approach is used in PODIO data model and EIC reconstruction software. -### Safe +## JApplication -JANA recognizes that not all of its users are proficient parallel programmers, and it steers users towards patterns which -mitigate some of the pitfalls. Specifically, it provides: +The [JApplication](https://jeffersonlab.github.io/JANA2/refcpp/class_j_application.html) +class is the central hub of the JANA2 framework, orchestrating all aspects of a JANA2-based +application. It manages the initialization, configuration, and execution of the data processing workflow, +serving as the entry point for interacting with the core components of the system. +By providing access to key managers, services, and runtime controls, +JApplication ensures that the application operates smoothly from setup to shutdown. +To illustrate this, here is a code of typical standalone JANA2 application: -- **Modern C++ features** such as smart pointers and judicious templating, to discourage common classes of bugs. JANA seeks to -make its memory ownership semantics explicit in the type system as much as possible. +```cpp +int main(int argc, char* argv[]) { -- **Internally managed locks** to reduce the learning curve and discourage tricky parallelism bugs. + auto params = new JParameterManager(); + // ... usually some processing of argv here adding them to JParameterManager -- **A stable API** with an effort towards backwards-compatibility, so that everybody can benefit from new features -and performance/stability improvements. + // Instantiate the JApplication with the parameter manager + JApplication app(params); + // Add predefined plugoms + app.AddPlugin("my_plugin"); + + // Register services: + app.ProvideService(std::make_shared()); + app.ProvideService(std::make_shared()); -### Fast + // Register components + app.Add(new JFactoryGeneratorT); + app.Add(new JFactoryGeneratorT); + app.Add(new JEventSourceGeneratorT); + app.Add(new MyEventProcessor()); -JANA uses low-level optimizations wherever it can in order to boost performance. + // Initialize and run the application + app.Initialize(); + app.Run(); -### Flexible + // Print the final performance report + app.PrintFinalReport(); -The simplest use case for JANA is to read a file of batched events, process each event independently, and aggregate -the results into a new file. However, it can be used in more sophisticated ways. + // Retrieve and return the exit code + return app.GetExitCode(); +} +``` -- Disentangling: Input data is bundled into blocks (each containing an array of entangled events) and we want to -parse each block in order to emit a stream of events (_flatmap_) +## Factories -- Software triggers: With streaming data readout, we may want to accept a stream of raw hit data and let JANA -determine the event boundaries. Arbitrary triggers can be created using existing JFactories. (_windowed join_) +We start with how the algorithms are implemented in JANA2, what is the data, +that flows between the algorithms and how those algorithms may be wired together. -- Subevent-level parallelism: This is necessary if individual events are very large. It may also play a role in -effectively utilizing a GPU, particularly as machine learning is adopted in reconstruction (_flatmap+merge_) +JANA implements a **factory model**, where data objects are the products, and the algorithms that generate them are the +factories. While there are various types of factories in JANA2 (covered later in this documentation), +they all follow the same fundamental concept: + +![JANA2 Factory diagram](_media/concepts-factory-diagram.png) + +This diagram illustrates the analogy to industry. When a specific data object is requested for the current event in JANA, +the framework identifies the corresponding algorithm (factory) capable of producing it. +The framework then checks if the factory has already produced this data for the current event +(i.e., if the product is "in stock"). + +- If the data **is already available**, it is retrieved and returned to the user. +- **If not**, the factory is invoked to produce the required data, and the newly generated data is returned to the user. + +To create the requested data, factories may need lower-level objects, +triggering requests to the corresponding factories. It continues until all required factories have been +invoked and the entire chain of dependent objects has been produced. + +In other words, JANA2 factories form a lazily evaluated directed acyclic graph +\([DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph)\) of data creation, +where all the produced data is cached until the entire event is finished processing. +Thus factories produce its objects only once for a given event making it efficient when the +same data is required from multiple algorithms. + + +### Multithreading and factories + +In context of factories it is important to at least briefly mention how they work +in terms of multithreading (much more details on it further) + +In JANA2, each thread has its own complete and independent set of factories capable of +fully reconstructing an event within that thread. This minimizes the use of locks which would be required +to coordinate between threads and subsequently degrade performance. Factory sets are maintained in a pool and +are (optionally) assigned affinity to a specific NUMA group. + +![JANA2 Factory diagram](_media/threading-schema.png) + +With some level of simplification, this diagram shows how sets of factories are created for each thread in the +working pool. Limited by IO operations, events usually must be read in from the source sequentially(orange) +and similarly written sequentially to the output(violet). + +### Imperative vs Declarative factories + +How the simplest factory looks in terms of code? Probably the simplest would be JFactory + +```cpp +// MyCluster - is what this factory outputs +class ExampleFactory : public JFactoryT { +public: + void Init() override { /* ... initialize what is needed */ } + + void Process(const std::shared_ptr &event) override + { + auto hits = event->Get(); // Request data of type MyHit from JANA + std::vector clusters; + for(auto hit: hits) {// ... // Produce clusters from hits + Set(clusters); // Set the output data + } +}; +``` + +The above code gives a glimpse into how such an algorithm or factory might look. +In later sections, we will explore the methods, their details, and other components that can be utilized. + +What’s important to note in this example is that `JFactory` follows the ***Imperative Approach***. +In this approach, the factory is provided with the `JEvent` interface, which it used to dynamically request +the data required by the algorithm as needed. + +JANA2 supports two distinct approaches for defining algorithms: + +- **Imperative Approach**: The algorithm determines dynamically what data it needs and requests + it through the JEvent interface. + +- **Declarative Approach**: The algorithm explicitly declares its required inputs and outputs upfront + in the class definition. +- +For instance, the declarative approach can be implemented using `JOmniFactory`. +Here's how the same factory might look when following the declarative approach: + +```cpp +class ExampleFactory : public JOmniFactory { +public: + + Input hits {this}; // Declare intputs + Output clusters {this}; // Declare what factory produces + + void Configure() override { /* ... same as Init() in JFactory */ } + + void Execute(int32_t run_number, int32_t event_number) override + { + // It is ensured that all inputs are ready, when Execute is called. + for(auto hit: hits()) {// ... // Produce clusters from hits + + clusters() = std::move(clusters) // Set the output data + } +}; +``` + +Declarative factories excel in terms of code management and clarity. +The declarative approach makes it immediately clear what an algorithm's inputs are and what it produces. +While this advantage may not be obvious in the above simple example, it becomes particularly evident when dealing +with complex algorithms that have numerous inputs, outputs, and configuration parameters. +For instance, consider a generic clustering algorithm that could later be adapted for various calorimeter detectors. + +In general, it is recommended to follow the declarative approach unless the dynamic flexibility +of imperative factories is explicitly required. + +As a good example scenario where the imperative approach is preferred is in software Level-3 (L3) triggers. +The imperative approach allows for highly efficient implementations of L3 (i.e., high-level) triggers. +A decision-making algorithm could be designed to request low-level objects first +to quickly determine whether to accept or reject an event. If the decision cannot be made using the low-level objects, +the algorithm can request higher-level objects for further evaluation. +This ability to dynamically activate factories on an event-by-event basis optimizes the L3 system’s throughput, +reducing the computational resources required to implement it. + +### Factory types + +Main factory types in JANA2 are: + +- `JFactory` - imperative factory with a single output type +- `JMultifactory` - imperative factory that can produce several types at once +- `JOmniFactory` - declarative factory with multiple outputs. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
DeclarativeImperative
JOmniFactoryJFactoryJMultifactory
InputsFixed number of input typesAny number of input types
Input requestsDeclared upfront in class definitionRequested dynamically through JEvent interface
OutputsMultiple types/outputsSingle typeMultiple types
Outputs declarationDeclared upfront in class definitionDeclared in class definitionMust be declared in constructor
+ + +### Declarative Factories + +```cpp + +/// A factory should be inherited from JOmniFactory +/// where T should be the factory class itself (CRTP) +struct HitRecoFactory : public JOmniFactory { + + /// "Output-s" is what data produced. + Output m_clusters{this}; + + /// "Input-s" is the data that factory uses to produce result + Input m_mcHits{this}; + + /// Additional service needed to produce data + Service m_calibration{this}; + + /// Parameters are values, that can be changed from command line + Parameter m_cfg_use_true_pos{this, "hits:min_edep_cut", 100, "Flag description"}; + + /// Configure is called once, to configure the algorithm + void Configure() { /* ... */ } + + /// Called when processing run number is changed + void ChangeRun(int32_t run_number) { /* ... get calibrations for run ... */ } + + /// Called for each event + void Execute(int32_t /*run_nr*/, uint64_t event_index) + { + auto result = std::vector(); + for(auto hit: m_mcHits()) { // get input data from event source or other factories + // ... produce clusters from hits + } + + // + m_clusters() = std::move(result); + } + +``` + +### Factory generators + +Since every working thread creates its set of factory, besides factories code one has to provide a way +how to create a factory. I.e. provide a factory generator class. Fortunately, JANA2 provides a templated +generic FactoryGeneratorT code that work for the majority of cases: + +```cpp +// For JFactories -JANA is also flexible enough to be compiled and run different ways. Users may compile their code into a standalone -executable, into one or more plugins which can be run by a generic executable, or run from a Jupyter notebook. +// For JOmniFactories +``` -## Comparison to other frameworks -Many different event reconstruction frameworks exist. The following are frequently compared and contrasted with JANA: +## Plugins + +In JANA2, plugins are dynamic libraries that extend the functionality of the main application by registering +additional components such as event sources, factories, event processors, and services. +Plugins are a powerful mechanism that allows developers to modularize their code, promote code reuse, +and configure applications dynamically at runtime without the need for recompilation. + +For a library to be recognized as a plugin, it must implement a specific initialization function called +`InitPlugin()` with C linkage. The function is called by JANA when plugins are loaded and should be used +for registering the plugin's components with the JApplication instance. + +```cpp +extern "C" { + void InitPlugin(JApplication* app) { + InitJANAPlugin(app); + // Register components: + app->Add(/** ... */); // add components from this plugin + app->Add(/** ... */); + // ... + } +} +``` + +### How Plugins Are Found and Loaded + +When a JANA2 application starts, it searches for plugins in specific directories. +The framework maintains a list of plugin search paths where it looks for plugin libraries. +By default, this includes directories such as: + +- The current working directory. +- Directories specified by the `JANA_PLUGIN_PATH` environment variable. +- Directories added programmatically via the `AddPluginPath()` method of `JApplication`. + +Plugins are loaded in two main ways: + +- **Automatic Loading**: The application can be configured to load plugins specified by + command-line arguments or configuration parameters via `-Pplugins` flag. + + ```bash + ./my_jana_application -Pplugins=MyPlugin1,AnotherPlugin + ``` + +- **Programmatic Loading**: Plugins can be loaded explicitly in the application code + by calling the `AddPlugin()` method of `JApplication`. + +### Plugins debugging + +JANA2 provides a very handy parameter `jana:debug_plugin_loading=1` which will print +the detailed information on the process of plugin loading. + + +## Object lifecycles + +It is important to understand who owns each JObject and when it is destroyed. + +By default, a JFactory owns all of the JObjects that it created during `Process()`. Once all event processors have +finished processing a `JEvent`, all `JFactories` associated with that `JEvent` will clears and delete their `JObjects`. +However, you can change this behavior by setting one of the factory flags: + +* `PERSISTENT`: Objects are neither cleared nor deleted. This is usually used for calibrations and translation tables. + Note that if an object is persistent, `JFactory::Process` will _not_ be re-run on the next `JEvent`. The user + may still update the objects manually, via `JFactory::BeginRun`, and must delete the objects manually via + `JFactory::EndRun` or `JFactory::Finish`. + +* `NOT_OBJECT_OWNER`: Objects are cleared from the `JFactory` but _not_ deleted. This is useful for "proxy" factories + (which reorganize objects that are owned by a different factory) and for `JEventGroups`. `JFactory::Process` _will_ be + re-run for each `JEvent`. As long as the objects are owned by a different `JFactory`, the user doesn't have to do any + cleanup. + +The lifetime of a `JFactory` spans the time that a `JEvent` is in-flight. No other guarantees are made: `JFactories` might +be re-used for multiple `JEvents` for the sake of efficiency, but the implementation is free to _not_ do so. In particular, +the user must never assume that one `JFactory` will see the entire `JEvent` stream. + +The lifetime of a `JEventSource` spans the time that all of its emitted `JEvents` are in-flight. + +The lifetime of a `JEventProcessor` spans the time that any `JEventSources` are active. + +The lifetime of a `JService` not only spans the time that any `JEventProcessors` are active, but also the lifetime of +`JApplication` itself. Furthermore, because JServices use `shared_ptr`, they are allowed to live even longer than +`JApplication`, which is helpful for things like writing test cases. + -- [Clara](https://claraweb.jlab.org/clara/) While JANA specializes in thread-level parallelism, Clara - uses node-level parallelism via a message-passing interface. This higher level of abstraction comes with some performance - overhead and significant orchestration requirements. On the other hand, it can scale to larger problem sizes and - support more general stream topologies. JANA is to OpenMP as Clara is to MPI. diff --git a/docs/index.md b/docs/index.md index f5fe1f1f9..a40c8bf5e 100644 --- a/docs/index.md +++ b/docs/index.md @@ -18,13 +18,94 @@ sites like [NERSC](http://www.nersc.gov/ ":target=_blank"). The project is [hosted on GitHub](https://github.com/JeffersonLab/JANA2) ```cpp -auto tracks = jevent->Get(); +auto tracks = event->Get(); for(auto t : tracks){ // ... do something with a track } ``` + +## Design philosophy + +JANA2's design philosophy can be boiled down to five values, ordered by importance: + +### Simple to use + +JANA2 focuses on making parallel computations over event-based\* data simple. +Unlike the aforementioned, JANA2's vocabulary of abstractions is designed around the needs of physicists rather than +general programmers. However, JANA2 does not attempt to meet _all_ of the needs of physicists. + +JANA2 recognizes that some tasks, like data persistence, should be handled separately. +As example, instead of providing its own persistence layer or requiring specific dependencies like ROOT, Numpy, or Apache Arrow, +JANA2 allows users to choose their preferred tools. +This flexibility ensures that if a team wants to switch from one tool to another (e.g., from ROOT to Arrow), +the core analysis code remains largely unaffected. + +To keep things simple, JANA minimizes the complexity of its build system and orchestration. +Using JANA should be straightforward: implement a several key interfaces, add an include path, and link against a single library. + +?> **Tip** The term `event-based` in JANA2 doesn't strictly refer to _physics_ or _trigger_ events. +In JANA2, `event` is used in a broader computer science context, aligning with the streaming readout paradigm +and supporting concepts like event nesting and sub-event parallelization. + + +### Well-organized + +While JANA's primary goal is running code in parallel, its secondary goal is imposing an organizing principle on the users' codebase. +This can be invaluable in a large collaboration where members vary in programming skill. Specifically, +JANA organizes processing logic into decoupled units. JFactories are agnostic of how and when their prerequisites are +computed, are only run when actually needed, and cache their results for reuse. Different analyses can coexist in separate +JEventProcessors. Components can be compiled into independent plugins, to be mixed and matched at runtime. All together, +JANA enforces an organizing principle that enables groups to develop and test their code with both freedom and discipline. + + +### Safe + +JANA recognizes that not all of its users are proficient parallel programmers, and it steers users towards patterns which +mitigate some of the pitfalls. Specifically, it provides: + +- **Modern C++ features** such as smart pointers and judicious templating, to discourage common classes of bugs. JANA seeks to +make its memory ownership semantics explicit in the type system as much as possible. + +- **Internally managed locks** to reduce the learning curve and discourage tricky parallelism bugs. + +- **A stable API** with an effort towards backwards-compatibility, so that everybody can benefit from new features +and performance/stability improvements. + + +### Fast + +JANA uses low-level optimizations wherever it can in order to boost performance. + +### Flexible + +The simplest use case for JANA is to read a file of batched events, process each event independently, and aggregate +the results into a new file. However, it can be used in more sophisticated ways. + +- Disentangling: Input data is bundled into blocks (each containing an array of entangled events) and we want to +parse each block in order to emit a stream of events (_flatmap_) + +- Software triggers: With streaming data readout, we may want to accept a stream of raw hit data and let JANA +determine the event boundaries. Arbitrary triggers can be created using existing JFactories. (_windowed join_) + +- Subevent-level parallelism: This is necessary if individual events are very large. It may also play a role in +effectively utilizing a GPU, particularly as machine learning is adopted in reconstruction (_flatmap+merge_) + +JANA is also flexible enough to be compiled and run different ways. Users may compile their code into a standalone +executable, into one or more plugins which can be run by a generic executable, or run from a Jupyter notebook. + + +## Comparison to other frameworks + +Many different event reconstruction frameworks exist. The following are frequently compared and contrasted with JANA: + +- [Clara](https://claraweb.jlab.org/clara/) While JANA specializes in thread-level parallelism, Clara + uses node-level parallelism via a message-passing interface. This higher level of abstraction comes with some performance + overhead and significant orchestration requirements. On the other hand, it can scale to larger problem sizes and + support more general stream topologies. JANA is to OpenMP as Clara is to MPI. + + ## History [JANA](https://halldweb.jlab.org/DocDB/0011/001133/002/Multithreading_lawrence.pdf) (**J**Lab **ANA**lysis framework)