From 587720b6609fd80ac19368afc32f56a893e45c6e Mon Sep 17 00:00:00 2001 From: Marie Backman Date: Mon, 16 Sep 2024 12:29:11 -0400 Subject: [PATCH] Developer documentation for communication flow (#174) * developer docs: high-level architecture * developer docs: communication flow --- conda_environment.yml | 1 + docs/conf.py | 1 + .../architecture/communication_flows.rst | 215 ++++++++++++++++++ docs/developer/architecture/index.rst | 9 + docs/developer/architecture/overview.rst | 100 ++++++++ docs/developer/index.rst | 12 +- docs/developer/instruction/autoreduction.rst | 2 + 7 files changed, 337 insertions(+), 3 deletions(-) create mode 100644 docs/developer/architecture/communication_flows.rst create mode 100644 docs/developer/architecture/index.rst create mode 100644 docs/developer/architecture/overview.rst diff --git a/conda_environment.yml b/conda_environment.yml index 790ca87f..7ef9e0d0 100644 --- a/conda_environment.yml +++ b/conda_environment.yml @@ -20,6 +20,7 @@ dependencies: - versioningit~=1.1 - pyoncat - sphinx_rtd_theme=1.2.* # readthedocs use this env file, and we need to install this theme here + - sphinxcontrib-mermaid - pip - pip: - django-auth-ldap==4.1.0 diff --git a/docs/conf.py b/docs/conf.py index 63d45f9f..deba9873 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -55,6 +55,7 @@ "sphinx.ext.ifconfig", "sphinx.ext.viewcode", "sphinx.ext.githubpages", + "sphinxcontrib.mermaid", ] # Add any paths that contain templates here, relative to this directory. diff --git a/docs/developer/architecture/communication_flows.rst b/docs/developer/architecture/communication_flows.rst new file mode 100644 index 00000000..6fad9165 --- /dev/null +++ b/docs/developer/architecture/communication_flows.rst @@ -0,0 +1,215 @@ +.. _communication_flows: + +Communication Flows +=================== + +This section presents communication sequences organized by WebMon functionality. + +.. contents:: :local: + +Experiment monitoring +--------------------- + +Instrument status and PV updates +................................ + +DASMON, from Data Acquisition (DAQ) System Monitor, provides instrument status and process variable +(PV) updates from the beamlines to WebMon. DASMON connects to the WebMon message broker to pass +status information, for example the current run number and count rate, to Dasmon listener. Due to +the high volume of PV updates, DASMON writes PV:s directly to the PostgreSQL database. + +.. mermaid:: + + sequenceDiagram + participant DASMON + participant Dasmon listener + participant Workflow DB + par + DASMON->>Workflow DB: PV update + and + DASMON->>Dasmon listener: Instrument status + Dasmon listener->>Workflow DB: Instrument status + end + +Run status updates +.................. + +The Stream Management Service (SMS) posts messages on the queue ``APP.SMS`` at run start, run stop +and when the Streaming Translation Client (STC) completes translation to NeXus. + +.. mermaid:: + + sequenceDiagram + participant SMS + participant Dasmon listener + participant Workflow DB + SMS->>Dasmon listener: Run started + Dasmon listener->>Workflow DB: Add new data run + Dasmon listener->>Workflow DB: Run status + SMS->>Dasmon listener: Run stopped + Dasmon listener->>Workflow DB: Run status + SMS->>Dasmon listener: Translation succeeded + Dasmon listener->>Workflow DB: Run status + + +Experiment data post-processing +------------------------------- + +Autoreduction and cataloging +............................ + +The sequence diagram below describes the communication flow as a run gets post-processed. +The post-processing workflow is triggered when the Streaming Translation Client (STC) has finished +translating the data stream to NeXus and sends a message on the queue ``POSTPROCESS.DATA_READY`` +specifying the instrument, IPTS, run number and location of the NeXus file. + +The post-processing workflow for the instrument is configurable in the database table +``report_task``. +The diagram shows the three post-processing steps that are available: autoreduction, cataloging of +raw data in `ONCat `_ and cataloging of reduced data in +`ONCat `_. +Note that the sequence in the diagram is one possible workflow, but there are variations in the +configured sequence and the steps included depending on the instrument. + +.. mermaid:: + + sequenceDiagram + participant STC + participant Workflow Manager + participant Autoreducer + participant ONCat + participant HFIR/SNS File Archive + + STC->>Workflow Manager: POSTPROCESS.DATA_READY + Workflow Manager->>Autoreducer: CATALOG.ONCAT.DATA_READY + Autoreducer->>Workflow Manager: CATALOG.ONCAT.STARTED + Autoreducer->>ONCat: pyoncat + Autoreducer->>Workflow Manager: CATALOG.ONCAT.COMPLETE + Workflow Manager->>Autoreducer: REDUCTION.DATA_READY + Autoreducer->>Workflow Manager: REDUCTION.STARTED + Autoreducer->>HFIR/SNS File Archive: reduced data, reduction log + Autoreducer->>Workflow Manager: REDUCTION.COMPLETE + Workflow Manager->>Autoreducer: REDUCTION_CATALOG.DATA_READY + Autoreducer->>Workflow Manager: REDUCTION_CATALOG.STARTED + Autoreducer->>ONCat: pyoncat + Autoreducer->>Workflow Manager: REDUCTION_CATALOG.COMPLETE + +Configuring the autoreduction +............................. + +In addition to run post-processing, the autoreducers handle updating instrument reduction script +parameters for instruments that have implemented +:doc:`autoreduction parameter configuration<../instruction/autoreduction>` at +`monitor.sns.gov/reduction// `_. + +.. mermaid:: + + sequenceDiagram + actor Instrument Scientist + participant WebMon + participant Autoreducer + participant HFIR/SNS File archive + + Instrument Scientist->>WebMon: Submit form with parameter values + WebMon->>Autoreducer: REDUCTION.CREATE_SCRIPT + Autoreducer->>HFIR/SNS File archive: Update instrument reduction script + +Live data visualization +-------------------------- + +Live Data Server (https://github.com/neutrons/live_data_server) is a service that serves plots to +the WebMon frontend. It provides a REST API with endpoints to create/update to and retrieve plots +from the Live Data Server database. + +Publish to Live Data Server from live data stream +................................................. + +Livereduce (https://github.com/mantidproject/livereduce/) allows scientists to reduce +data from an ongoing experiment, i.e. before translation to NeXus, by connecting to the live data +stream from the Stream Management Service (SMS). The instrument-specific livereduce processing +script can make the results available in WebMon by publishing plots to Live Data Server. + +.. mermaid:: + + sequenceDiagram + participant SMS + participant Livereduce + participant Live Data Server + + SMS->>Livereduce: data stream + loop Every N minutes + Livereduce->>Livereduce: run processing script + Livereduce->>Live Data Server: HTTP POST + end + +Publish to Live Data Server from autoreduction script +..................................................... + +The instrument-specific autoreduction script can include a step to publish plots (in either JSON +format or HTML div) to Live Data Server. The Post-processing Agent repository includes some +convenience functions for generating and publishing plots in `publish_plot.py +`_. + +.. mermaid:: + + sequenceDiagram + participant Workflow Manager + participant Autoreducer + participant Live Data Server + + Workflow Manager->>Autoreducer: REDUCTION.DATA_READY + opt Publish plot + Autoreducer->>Live Data Server: HTTP POST + end + +Display plot from Live Data Server +................................ + +Run overview pages (``monitor.sns.gov/report///``) will query the Live +Data Server for a plot for that instrument and run number and display it if available. + +.. mermaid:: + + sequenceDiagram + participant WebMon + participant Live Data Server + + WebMon->>Live Data Server: HTTP GET + loop Every 60 s + WebMon->>Live Data Server: HTTP GET + end + +System diagnostics +------------------ + +WebMon displays system diagnostics information on https://monitor.sns.gov/dasmon/common/diagnostics/ +and diagnostics for DASMON and PVSD at the beamline at +`https://monitor.sns.gov/dasmon//diagnostics/ +`_. +Diagnostics information is primarily collected by Dasmon listener. + +Heartbeats from services +........................ + +Dasmon listener subscribes to heartbeats from the other services. There is a mechanism for alerting +admins by email when a service has missed heartbeats (needs to be verified that this still works). + +.. mermaid:: + + flowchart LR + SMS["SMS (per beamline)"] + PVSD["PVSD (per beamline)"] + DASMON["DASMON (per beamline)"] + STC + Autoreducers + DasmonListener + WorkflowDB[(DB)] + SMS-->|heartbeat|DasmonListener + PVSD-->|heartbeat|DasmonListener + DASMON-->|heartbeat|DasmonListener + STC-->|heartbeat|DasmonListener + Autoreducers-->|heartbeat|DasmonListener + WorkflowManager-->|heartbeat|DasmonListener + DasmonListener-->|heartbeat|DasmonListener + DasmonListener-->WorkflowDB + DasmonListener-.->|if missed 3 heartbeats|InstrumentScientist diff --git a/docs/developer/architecture/index.rst b/docs/developer/architecture/index.rst new file mode 100644 index 00000000..0fedd477 --- /dev/null +++ b/docs/developer/architecture/index.rst @@ -0,0 +1,9 @@ +Architecture +============ + +.. toctree:: + :maxdepth: 1 + :caption: Index + + overview + communication_flows diff --git a/docs/developer/architecture/overview.rst b/docs/developer/architecture/overview.rst new file mode 100644 index 00000000..64e07d2d --- /dev/null +++ b/docs/developer/architecture/overview.rst @@ -0,0 +1,100 @@ +Overview +======== + +High-level architecture +----------------------- + +The diagram below describes the high-level architecture of WebMon, including both internal resources +that are considered part of WebMon and external systems that WebMon interacts with. +The arrows represent relationships between these services and resources. + +.. mermaid:: + + flowchart LR + FileArchive[("`SNS/HFIR + File archive`")] + subgraph DAS + DASMON + TranslationService["`Streaming + Translation + Client + (STC)`"] + SMS["`Stream + Management + Service + (SMS)`"] + end + WorkflowManager[Workflow Manager] + DasmonListener[Dasmon listener] + Database[(Workflow DB)] + Autoreducers-->ONCat + LiveDataServer-->WebMon + LiveDataServer<-->LiveDataDB[(LiveData DB)] + LiveReduce + WebMon["`WebMon + frontend`"] + SMS-->LiveReduce + TranslationService-->WorkflowManager + DASMON-->DasmonListener + DASMON-->Database + WorkflowManager-->Database + WorkflowManager<-->Autoreducers + Autoreducers-->LiveDataServer + Database-->WebMon + ONCat-->WebMon + LiveReduce-->LiveDataServer + DasmonListener-->Database + TranslationService-->FileArchive + FileArchive<-->Autoreducers + style DAS fill:#D3D3D3, stroke-dasharray: 5 5 + classDef externalStyle fill:#FAEFDE, stroke:#B08D55 + class DASMON,TranslationService,SMS,FileArchive,ONCat externalStyle + + subgraph Legend + direction LR + Internal["Internal resource"] + External["External resource"] + Internal ~~~ External + end + LiveReduce ~~~ Internal + style Legend fill:#FFFFFF,stroke:#000000 + class External externalStyle + +The gray box labeled "DAS" are services managed by the Data Acquisition System team that pass +information to WebMon. The autoreducers interact with the HFIR/SNS file archive to access +instrument-specific reduction scripts and experiment data files. The autoreducers also write reduced +data files and reduction log files back to the file archive. + +Another external component is the experiment data catalog, `ONCat `_, where +the autoreducers catalog experiment metadata. The WebMon frontend retrieves and displays this +metadata from ONCat. + +The section :ref:`communication_flows` includes sequence diagrams that show how the services +interact. + +Message broker +-------------- + +WebMon uses an `ActiveMQ `_ message broker for communication between +services. The message broker also serves as a load balancer by distributing post-processing jobs to +the available autoreducers in a round-robin fashion. + +.. mermaid:: + + flowchart TB + TranslationService["`Streaming + Translation + Client + (STC)`"] + SMS["`Stream + Management + Service + (SMS)`"] + Broker[ActiveMQ broker] + Broker<-->Autoreducers + Broker<-->WorkflowManager[Workflow Manager] + Broker<-->DasmonListener[Dasmon listener] + Broker<-->DASMON + Broker<-->PVSD + Broker<-->TranslationService + Broker<-->SMS diff --git a/docs/developer/index.rst b/docs/developer/index.rst index b75af0cd..b64996db 100644 --- a/docs/developer/index.rst +++ b/docs/developer/index.rst @@ -1,6 +1,15 @@ Developer documentation ======================= +Architecture +------------ + +.. toctree:: + :maxdepth: 1 + + architecture/overview + architecture/communication_flows + Developer Guide --------------- @@ -25,15 +34,12 @@ The web-monitor contains three independent Django applications * :py:mod:`webmon `: user facing web interface, visit the production version at `monitor.sns.gov`_. * :py:mod:`workflow`: backend manager. -and a mocked catalog services. - .. toctree:: :maxdepth: 1 dasmon/modules webmon/modules workflow/modules - catalog/modules .. _monitor.sns.gov: https://monitor.sns.gov/ diff --git a/docs/developer/instruction/autoreduction.rst b/docs/developer/instruction/autoreduction.rst index ff0e874c..2ed61ecc 100644 --- a/docs/developer/instruction/autoreduction.rst +++ b/docs/developer/instruction/autoreduction.rst @@ -1,3 +1,5 @@ +.. autoreduction-parameter-configuration + How to Modify an Instrument Autoreduction Configuration Page ============================================================