Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write to standard out regularly instead of just the end of the process #19

Open
SebastianArcq opened this issue Apr 26, 2024 · 4 comments

Comments

@SebastianArcq
Copy link

SebastianArcq commented Apr 26, 2024

When processing big tables there can be a long time when Sling sits on "INF streaming data". Only when everything is streamed then the output is written to the terminal. So in practice there can be minutes / hours where nothing seems to be happening, and at the very end the entire log (e.g., 7m57s 22,083,410 46359 r/s) is dumped.

For my own projects I have successfully followed the suggestion from this Stackoverflow article, i.e., to call sys.stdout.flush() regularly - maybe that's the solution here as well!

@flarco
Copy link
Contributor

flarco commented Apr 26, 2024

Ah I see, can you confirm that it works as expected using the Go binary directly (non-python wrapper)?
It should flush as you've described: https://github.com/slingdata-io/sling-cli/blob/main/core/sling/task_run_write.go#L90

@SebastianArcq
Copy link
Author

SebastianArcq commented Apr 26, 2024

Yes, if I use $ sling run directly the output works as expected:
Screenshot 2024-04-26 at 17 04 41

With the Python wrapper it looks like this (with the whole log appearing at once at the end):
image

@flarco
Copy link
Contributor

flarco commented Apr 26, 2024

Ah i see, yeah It's the progress update that writes and overwrites the text (in-place) without creating a new line, I'm not too sure how to do this correctly in python since it's taking the output of the child process (go binary).
Do you mind opening a PR and testing locally? I'm pretty busy these days, so I may not get to this soon

@SebastianArcq
Copy link
Author

SebastianArcq commented Apr 26, 2024

Appreciate the invitation, but having never contributed to a public repo I don't feel experienced enough to work on your project.

However, this would be the correct way in Python to overwrite the output in-place:

import time

for n in range(1, 5):
    print(n, end='\r')
    time.sleep(1) 
    n += 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants