-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature - stdout live reporting #16975
Feature - stdout live reporting #16975
Conversation
1f25df5
to
8ac0e54
Compare
@gecage952 Very cool! I'll defer to others for feedback on the API design, but I went ahead and pushed a minor change fixing up the linting and client test failures, and updating the client API schema. |
e5b887e
to
201f556
Compare
201f556
to
24e7d15
Compare
That is very cool and a feature we also get asked every now and then. As a Pulsar deployer, I'm also interested in the Pulsar part and how this works in practice. Are you using the MQ deployment of Pulsar? |
- stdout_position: The index of the character to begin reading stdout from | ||
- stdout_length: How many characters of stdout to read | ||
- stderr_position: The index of the character to begin reading stderr from | ||
- stderr_length: How many characters of stderr to read |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add two separate endpoints job for the jobs' stdout and stderr ?
Yeah, rabbitmq. |
This is very cool, thanks a lot!
that'd be great!
The integration tests are going to run with the local job runner by default (as opposed to the API tests that could run against external Galaxy servers where the stdio streams may not be available). What you can do is submit a tool job against such an instance that prints something to stdout, then sleeps and prints something at the end, in your test you can then assert that you saw the first message but not the second. Take a look at https://github.com/galaxyproject/galaxy/blob/dev/test/integration/test_job_recovery.py#L30-L36 for running tools in integration tests. Let me know if you need any help with this. |
So, I opened a pr in the pulsar repo with that code: galaxyproject/pulsar#345 |
…/galaxy into feature_stdout_live_reporting
Ok, I added a new endpoint for getting stdout and stderr, and then updated everything to use the new endpoint. I see it was mentioned to have them separate, and I can still do that if necessary. I just combined them here now for my own personal testing purposes. |
I'll also take a look at the merge conflicts. |
Updated so that it will check the job destination params to see if the assigned destination has the |
Fixed the current merge conflicts. |
Can you please run |
Noticed the api schema needed to be updated again, so I went ahead and did that. Should be good to review if tests pass. |
Looks like you need to run 'make update-client-api-schema' and commit results. |
Gotcha thanks, just did it. |
Pushed a merge commit to resolve conflicts. This looks really great, nice work and I'm so sorry for the delay. I think we will get this into the forthcoming 24.2 release. |
Awesome! No worries on the delay. I totally get it. |
Was just wondering what happens in real user setups, where a |
I imagine it won't work - it is off by default though in my testing so I think it isn't a blocker. I don't have a setup for testing that - but it might feasible to fix if it doesn't work by ensuring the relevant files are readable to the Galaxy user (maybe a job destination option for setting group or world readable permissions). |
Me to. Wondering if we really need the
If we can drop this, the Galaxy user would still have read access. I could test this (but unlikely before the release). |
In my testing, those types of errors get caught, and it just appears the same as the default behavior to users. It does log the error though. |
I've rerun the failing integration tests and I think
is a valid failure. I'm not getting much from digging through the debug logging and this is kind of a pain to test locally because of the MQ. I'll try to keep digging we're trying to branch very soon and I really want to get this in before then. From the logs: So the error is we're waiting on a history that we expect to end find, but there are datasets in the "failed_metadata" state. The job logs are verbose with all the file transfers up and down... but toward the end... they mostly look fine and have no indications about this dataset as far as I can tell.
The history in console output logging does show the failed_metadata dataset:
Most of the outputs are fine. They are defined here https://github.com/galaxyproject/galaxy/blob/dev/test/functional/tools/all_output_types.xml#L21. I guess one of the more esoteric was to catch datasets is failing here - likely discover_datasets but I cannot tell from the logs exactly. |
Got permission from Marius in exchange for pledge to fix it before the release.
So lets restrict this new append in the job files behavior to just append for the tool_stdout and tool_stderr files.
This is awesome, thanks @gecage952 ! 🎉 |
Hi,
As part of our work at Oak Ridge National Lab, we've been using Galaxy for quite a while (some of us have attended GCC as well). We've also been doing some internal development on some features that users have requested here. One the most common ones was the ability to see live console output as jobs are running. This issue has been brought up before, for example #2332. But given, our users wanted it at the present, I took a stab at an implementation. My main goal was to minimize impact to the way Galaxy works now, so that there aren't any compatibility issues. I'll try to provide details about each part that was touched.
Overview
The overall idea here was to add logic to allow the job manager to read the tool_stdout and tool_stderr files that are saved in the job directory, and return them as part of a status. The reason I put it into the status is because the UI already calls the status regularly form the JobInformation page, so I wouldn't have to make a new thread or anything. It also just kinda made sense to me that you might want it as part ofhte status of the job. To facilitate this, the api endpoint for getting job status was adjusted to allow parameters to select which part of the stdout/stderr (both work the same, so I'll just refer to stdout from here) that you want. There's
stdout_pos
which is the starting index in the stdout_file, andstdout_length
which is how much of the stdout that you want (in chars). Because stdout could potentially be a relatively larger file, I didn't want to force people to read the whole file every time status is called.I then adjusted the UI for the job information view in a few different ways. First, I made the code blocks scrollable and set a max height for them. Then I moved the expand on click functionality over to only be on the expand icons, rather than the whole table row (if users would try and highlight a part of the stdout or click on the scroll bar to scroll, it would collapse the view). Lastly, I added an autoscroll feature that automatically scrolls the code blocks when the user is at the bottom of the stdout. If the user scrolls up, this is disabled. If they scroll back to the bottom, it starts again.
The last thing I want to note is that as far as compatibility with job runners go, we almost exclusively use Pulsar for running jobs. As is, this pr will only work for job runners that save their stdout to the job directory inside of the Galaxy. Internally, we've added functionality for Pulsar to do this (the purpose of lib/galaxy/webapps/galaxy/api/job_files.py changes). I did not include those changes here, because that would require an additional pr to the Pulsar repository. Let me know if there's interest in seeing that however. It would also be nice to get some feedback on testing this.
I understand this is a pretty big change, and I imagine there are a lot of areas for improvement. Please let me know if this is something people are interested in helping with, or if this is a terrible way to try to do this, or whatever. It's worked for us so far internally, but I would love to have some feedback from here.
How to test the changes?
(Select all options that apply)
License