Feature - stdout live reporting #16975

gecage952 · 2023-11-03T17:59:29Z

Hi,
As part of our work at Oak Ridge National Lab, we've been using Galaxy for quite a while (some of us have attended GCC as well). We've also been doing some internal development on some features that users have requested here. One the most common ones was the ability to see live console output as jobs are running. This issue has been brought up before, for example #2332. But given, our users wanted it at the present, I took a stab at an implementation. My main goal was to minimize impact to the way Galaxy works now, so that there aren't any compatibility issues. I'll try to provide details about each part that was touched.

Overview

The overall idea here was to add logic to allow the job manager to read the tool_stdout and tool_stderr files that are saved in the job directory, and return them as part of a status. The reason I put it into the status is because the UI already calls the status regularly form the JobInformation page, so I wouldn't have to make a new thread or anything. It also just kinda made sense to me that you might want it as part ofhte status of the job. To facilitate this, the api endpoint for getting job status was adjusted to allow parameters to select which part of the stdout/stderr (both work the same, so I'll just refer to stdout from here) that you want. There's stdout_pos which is the starting index in the stdout_file, and stdout_length which is how much of the stdout that you want (in chars). Because stdout could potentially be a relatively larger file, I didn't want to force people to read the whole file every time status is called.

I then adjusted the UI for the job information view in a few different ways. First, I made the code blocks scrollable and set a max height for them. Then I moved the expand on click functionality over to only be on the expand icons, rather than the whole table row (if users would try and highlight a part of the stdout or click on the scroll bar to scroll, it would collapse the view). Lastly, I added an autoscroll feature that automatically scrolls the code blocks when the user is at the bottom of the stdout. If the user scrolls up, this is disabled. If they scroll back to the bottom, it starts again.

The last thing I want to note is that as far as compatibility with job runners go, we almost exclusively use Pulsar for running jobs. As is, this pr will only work for job runners that save their stdout to the job directory inside of the Galaxy. Internally, we've added functionality for Pulsar to do this (the purpose of lib/galaxy/webapps/galaxy/api/job_files.py changes). I did not include those changes here, because that would require an additional pr to the Pulsar repository. Let me know if there's interest in seeing that however. It would also be nice to get some feedback on testing this.

I understand this is a pretty big change, and I imagine there are a lot of areas for improvement. Please let me know if this is something people are interested in helping with, or if this is a terrible way to try to do this, or whatever. It's worked for us so far internally, but I would love to have some feedback from here.

How to test the changes?

(Select all options that apply)

I've included appropriate automated tests.
This is a refactoring of components with existing test coverage.
Instructions for manual testing are as follows:
1. The best way to test is to start a tool that will run for some time with stdout.
2. Start the tool and go to the JobInformation page for the job.
3. Expand the Tool stdout by clicking on the expand icon to the right.

License

I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

merge dev into branch

dannon · 2023-11-06T13:46:45Z

@gecage952 Very cool! I'll defer to others for feedback on the API design, but I went ahead and pushed a minor change fixing up the linting and client test failures, and updating the client API schema.

bgruening · 2023-11-07T13:54:58Z

That is very cool and a feature we also get asked every now and then. As a Pulsar deployer, I'm also interested in the Pulsar part and how this works in practice. Are you using the MQ deployment of Pulsar?

mvdbeek · 2023-11-07T16:51:58Z

lib/galaxy/webapps/galaxy/api/jobs.py

+        - stdout_position: The index of the character to begin reading stdout from
+        - stdout_length: How many characters of stdout to read
+        - stderr_position: The index of the character to begin reading stderr from
+        - stderr_length: How many characters of stderr to read


Can you add two separate endpoints job for the jobs' stdout and stderr ?

gecage952 · 2023-11-07T17:30:32Z

That is very cool and a feature we also get asked every now and then. As a Pulsar deployer, I'm also interested in the Pulsar part and how this works in practice. Are you using the MQ deployment of Pulsar?

Yeah, rabbitmq.
The general idea of the pulsar piece is to send the stdout/stderr files to the job_files api endpoint periodically. To make sure we aren't sending the entire file every time, we keep track of the last position in a map. Then when it's time to send the next chunk, we seek to that position in the files, post it to the job_files endpoint which appends it to the file in the Galaxy job directory. We also considered using the message queue for this instead of the api, but ended up not going that direction.

mvdbeek · 2023-11-07T18:16:10Z

This is very cool, thanks a lot!

Let me know if there's interest in seeing that however.

that'd be great!

It would also be nice to get some feedback on testing this.

The integration tests are going to run with the local job runner by default (as opposed to the API tests that could run against external Galaxy servers where the stdio streams may not be available). What you can do is submit a tool job against such an instance that prints something to stdout, then sleeps and prints something at the end, in your test you can then assert that you saw the first message but not the second. Take a look at https://github.com/galaxyproject/galaxy/blob/dev/test/integration/test_job_recovery.py#L30-L36 for running tools in integration tests. Let me know if you need any help with this.

…dout is none

gecage952 · 2023-11-17T19:52:31Z

So, I opened a pr in the pulsar repo with that code: galaxyproject/pulsar#345
I'll try to get to the suggestions here in the coming weeks (busy time of year).

…/galaxy into feature_stdout_live_reporting

gecage952 · 2023-12-15T20:04:43Z

Ok, I added a new endpoint for getting stdout and stderr, and then updated everything to use the new endpoint. I see it was mentioned to have them separate, and I can still do that if necessary. I just combined them here now for my own personal testing purposes.

gecage952 · 2023-12-15T20:05:50Z

I'll also take a look at the merge conflicts.

gecage952 · 2024-06-14T15:28:33Z

Updated so that it will check the job destination params to see if the assigned destination has the live_tool_output_reporting param set to true. One thing I've noticed is that the way the pages refreshes can be a little jarring. I know in previous versions of Galaxy it was a different.

gecage952 · 2024-07-12T14:56:39Z

Fixed the current merge conflicts.

bgruening · 2024-07-12T18:10:22Z

Can you please run make update-client-api-schema

gecage952 · 2024-09-11T14:44:08Z

Noticed the api schema needed to be updated again, so I went ahead and did that. Should be good to review if tests pass.

nsoranzo · 2024-09-17T14:21:09Z

Looks like you need to run 'make update-client-api-schema' and commit results.

gecage952 · 2024-09-23T15:48:33Z

Gotcha thanks, just did it.

…porting

jmchilton · 2024-11-12T17:22:16Z

Pushed a merge commit to resolve conflicts. This looks really great, nice work and I'm so sorry for the delay. I think we will get this into the forthcoming 24.2 release.

gecage952 · 2024-11-12T19:02:53Z

Awesome! No worries on the delay. I totally get it.

bernt-matthias · 2024-11-13T15:43:10Z

Was just wondering what happens in real user setups, where a chown happens before the job runs (i.e. the Galaxy user won't be able to access the files while the jobs runs).

jmchilton · 2024-11-14T14:36:27Z

Was just wondering what happens in real user setups, where a chown happens before the job runs (i.e. the Galaxy user won't be able to access the files while the jobs runs).

I imagine it won't work - it is off by default though in my testing so I think it isn't a blocker. I don't have a setup for testing that - but it might feasible to fix if it doesn't work by ensuring the relevant files are readable to the Galaxy user (maybe a job destination option for setting group or world readable permissions).

bernt-matthias · 2024-11-14T15:59:45Z

I think it isn't a blocker.

Me to.

Wondering if we really need the chgrp here:

If we can drop this, the Galaxy user would still have read access.

I could test this (but unlikely before the release).

gecage952 · 2024-11-14T16:21:02Z

In my testing, those types of errors get caught, and it just appears the same as the default behavior to users. It does log the error though.

jmchilton · 2024-11-14T16:43:22Z

I've rerun the failing integration tests and I think

test/integration/test_pulsar_embedded_mq.py::TestEmbeddedMessageQueuePulsarPurge::test_purge_while_job_running

is a valid failure. I'm not getting much from digging through the debug logging and this is kind of a pain to test locally because of the MQ. I'll try to keep digging we're trying to branch very soon and I really want to get this in before then.

From the logs:

So the error is we're waiting on a history that we expect to end find, but there are datasets in the "failed_metadata" state.

The job logs are verbose with all the file transfers up and down... but toward the end... they mostly look fine and have no indications about this dataset as far as I can tell.

pulsar.client.staging.down INFO 2024-11-14 15:17:00,865 [pN:main,p:9546,tN:PulsarJobRunner.work_thread-1] collecting output outputs_new/implicit_dataset_conversions.txt with action FileAction[path=/tmp/tmpztun6q_f/tmplbeuc894/tmp9x0bam0_/database/job_working_directory_py8fi90v/000/1/metadata/outputs_new/implicit_dataset_conversions.txt,action_type=remote_transfer,url=http://localhost:8199/api/jobs/adb5f5c93f827949/files?job_key=56a9d5119a62bf92&path=%2Ftmp%2Ftmpztun6q_f%2Ftmplbeuc894%2Ftmp9x0bam0_%2Fdatabase%2Fjob_working_directory_py8fi90v%2F000%2F1%2Fmetadata%2Foutputs_new%2Fimplicit_dataset_conversions.txt&file_type=output_metadata]
pulsar.client.staging.down DEBUG 2024-11-14 15:17:00,865 [pN:main,p:9546,tN:PulsarJobRunner.work_thread-1] Cleaning up job (failed [False], cleanup_job [onsuccess])
galaxy.tool_util.provided_metadata DEBUG 2024-11-14 15:17:00,887 [pN:main,p:9546,tN:PulsarJobRunner.work_thread-1] unnamed outputs [{'output_tool_supplied_metadata': {'name': 'my dynamic name', 'ext': 'txt', 'info': 'my dynamic info'}}]
galaxy.model.store.discover DEBUG 2024-11-14 15:17:00,890 [pN:main,p:9546,tN:PulsarJobRunner.work_thread-1] (1) Created dynamic collection dataset for path [/tmp/tmpztun6q_f/tmplbeuc894/tmp9x0bam0_/database/job_working_directory_py8fi90v/000/1/working/output.txt] with element identifier [output] for output [discovered_list] (0.756 ms)
galaxy.model.store.discover DEBUG 2024-11-14 15:17:00,893 [pN:main,p:9546,tN:PulsarJobRunner.work_thread-1] (1) Add dynamic collection datasets to history for output [discovered_list] (2.630 ms)
galaxy.jobs INFO 2024-11-14 15:17:00,953 [pN:main,p:9546,tN:PulsarJobRunner.work_thread-1] Collecting metrics for Job 1 in /tmp/tmpztun6q_f/tmplbeuc894/tmp9x0bam0_/database/job_working_directory_py8fi90v/000/1/metadata
galaxy.jobs DEBUG 2024-11-14 15:17:00,964 [pN:main,p:9546,tN:PulsarJobRunner.work_thread-1] job_wrapper.finish for job 1 executed (98.738 ms)
INFO:     127.0.0.1:46454 - "GET /api/histories/adb5f5c93f827949 HTTP/1.1" 200 OK
Problem in history with id adb5f5c93f827949 - summary of history's datasets and jobs below.
INFO:     127.0.0.1:46470 - "GET /api/histories/adb5f5c93f827949/contents HTTP/1.1" 200 OK
--------------------------------------

The history in console output logging does show the failed_metadata dataset:

--------------------------------------
| 6 - all_output_types (HID - NAME) 
INFO:     127.0.0.1:46556 - "GET /api/histories/adb5f5c93f827949/contents/f3f73e481f432006 HTTP/1.1" 200 OK
| Dataset State:
|  failed_metadata
| Dataset Blurb:
|  1 line
| Dataset Info:
|  *Dataset info is empty.*
| Peek:
|  <table cellspacing="0" cellpadding="3"><tr><td>hi</td></tr></table>
INFO:     127.0.0.1:46560 - "GET /api/histories/adb5f5c93f827949/contents/f3f73e481f432006/provenance HTTP/1.1" 200 OK
| Dataset Job Standard Output:
|  *Standard output was empty.*
| Dataset Job Standard Error:
|  *Standard error was empty.*
|
--------------------------------------

Most of the outputs are fine. They are defined here https://github.com/galaxyproject/galaxy/blob/dev/test/functional/tools/all_output_types.xml#L21. I guess one of the more esoteric was to catch datasets is failing here - likely discover_datasets but I cannot tell from the logs exactly.

Got permission from Marius in exchange for pledge to fix it before the release.

…porting

So lets restrict this new append in the job files behavior to just append for the tool_stdout and tool_stderr files.

martenson · 2024-11-21T16:22:14Z

This is awesome, thanks @gecage952 ! 🎉

gecage952 and others added 5 commits October 17, 2023 14:55

Add option to load stdout and stderr when requesting job status

58d24dc

Update api endpoint for jobs

89cc78a

Add missing variables and import

870917b

Add capability for UI to load live job stdout and stderr

6dfbc7a

Merge pull request #1 from gecage952/dev

fba01ad

merge dev into branch

github-actions bot added the area/API label Nov 3, 2023

martenson added area/jobs kind/feature labels Nov 3, 2023

dannon force-pushed the feature_stdout_live_reporting branch from 1f25df5 to 8ac0e54 Compare November 6, 2023 13:46

dannon force-pushed the feature_stdout_live_reporting branch from e5b887e to 201f556 Compare November 6, 2023 14:03

dannon added 3 commits November 6, 2023 09:04

Linting / format.

de7cadf

Fix jest test

31065a6

Update API schema

24e7d15

dannon force-pushed the feature_stdout_live_reporting branch from 201f556 to 24e7d15 Compare November 6, 2023 14:05

mvdbeek reviewed Nov 7, 2023

View reviewed changes

Configure pulsar runner to read stdout from file if finish message st…

d336cad

…dout is none

gecage952 mentioned this pull request Nov 17, 2023

[WIP] Feature: Send stdout and stderr to Galaxy while job is running galaxyproject/pulsar#345

Open

gecage952 and others added 2 commits November 17, 2023 14:53

XMerge branch 'feature_stdout_live_reporting' of github.com:gecage952…

afa0b25

…/galaxy into feature_stdout_live_reporting

Merge branch 'dev' into feature_stdout_live_reporting

d2be71f

martenson marked this pull request as draft November 17, 2023 20:02

gecage952 added 3 commits December 15, 2023 13:52

Add new api endpoint to fetch job stdout and stderr

fd0237f

Use stdout api endpoint in job info ui while job is running

70bfcc4

Merge upstream into branch

7516d6a

update client api schema

deaba39

Fix merge conflicts

907328e

gecage952 added 2 commits July 12, 2024 14:41

Update client api schema

2e472a9

Update client api schema

811a897

Update client api schema

53051e7

mvdbeek requested a review from jmchilton September 24, 2024 13:11

Merge remote-tracking branch 'origin/dev' into feature_stdout_live_re…

858a245

…porting

jmchilton approved these changes Nov 12, 2024

View reviewed changes

jmchilton added 2 commits November 12, 2024 13:37

Update JobInformation.test.js for live console output.

a2bffb8

Improved error handling for live console for tool execution.

8063632

jmchilton added 4 commits November 17, 2024 20:50

xfail the test for the freeze.

be11412

Got permission from Marius in exchange for pledge to fix it before the release.

Merge remote-tracking branch 'origin/dev' into feature_stdout_live_re…

a69031b

…porting

linting fix

f0aa89a

Ahhh... the set metadata file is JSON and cannot be appended.

394fb16

So lets restrict this new append in the job files behavior to just append for the tool_stdout and tool_stderr files.

jmchilton merged commit 8c30a87 into galaxyproject:dev Nov 18, 2024
55 of 56 checks passed

jdavcs added the highlight Included in user-facing release notes at the top label Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature - stdout live reporting #16975

Feature - stdout live reporting #16975

gecage952 commented Nov 3, 2023 •

edited

Loading

dannon commented Nov 6, 2023 •

edited

Loading

bgruening commented Nov 7, 2023

mvdbeek Nov 7, 2023

gecage952 commented Nov 7, 2023

mvdbeek commented Nov 7, 2023

gecage952 commented Nov 17, 2023 •

edited

Loading

gecage952 commented Dec 15, 2023

gecage952 commented Dec 15, 2023

gecage952 commented Jun 14, 2024

gecage952 commented Jul 12, 2024

bgruening commented Jul 12, 2024

gecage952 commented Sep 11, 2024

nsoranzo commented Sep 17, 2024

gecage952 commented Sep 23, 2024

jmchilton commented Nov 12, 2024

gecage952 commented Nov 12, 2024

bernt-matthias commented Nov 13, 2024

jmchilton commented Nov 14, 2024

bernt-matthias commented Nov 14, 2024

gecage952 commented Nov 14, 2024

jmchilton commented Nov 14, 2024 •

edited

Loading

martenson commented Nov 21, 2024

Feature - stdout live reporting #16975

Feature - stdout live reporting #16975

Conversation

gecage952 commented Nov 3, 2023 • edited Loading

Overview

How to test the changes?

License

dannon commented Nov 6, 2023 • edited Loading

bgruening commented Nov 7, 2023

mvdbeek Nov 7, 2023

Choose a reason for hiding this comment

gecage952 commented Nov 7, 2023

mvdbeek commented Nov 7, 2023

gecage952 commented Nov 17, 2023 • edited Loading

gecage952 commented Dec 15, 2023

gecage952 commented Dec 15, 2023

gecage952 commented Jun 14, 2024

gecage952 commented Jul 12, 2024

bgruening commented Jul 12, 2024

gecage952 commented Sep 11, 2024

nsoranzo commented Sep 17, 2024

gecage952 commented Sep 23, 2024

jmchilton commented Nov 12, 2024

gecage952 commented Nov 12, 2024

bernt-matthias commented Nov 13, 2024

jmchilton commented Nov 14, 2024

bernt-matthias commented Nov 14, 2024

gecage952 commented Nov 14, 2024

jmchilton commented Nov 14, 2024 • edited Loading

martenson commented Nov 21, 2024

gecage952 commented Nov 3, 2023 •

edited

Loading

dannon commented Nov 6, 2023 •

edited

Loading

gecage952 commented Nov 17, 2023 •

edited

Loading

jmchilton commented Nov 14, 2024 •

edited

Loading