Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document behavior and usage of asynchronous global assigns #89

Closed
kolia opened this issue Apr 22, 2020 · 6 comments
Closed

Document behavior and usage of asynchronous global assigns #89

kolia opened this issue Apr 22, 2020 · 6 comments

Comments

@kolia
Copy link

kolia commented Apr 22, 2020

I think documenting asynchronous behavior is important because data science / scientific computation people are an important audience for notebooks, and in a reactive notebook a workflow that is likely to work well is having long-running computations run in the background while you continue to work on the rest of the notebook. Maybe.

I tried driving an animation from julia, using @async or @Spawn to increment a global time variable in the background. The time variable does get updated, but the asynchronous updates do not trigger reactive evaluation of dependent cells.

Originally I was thinking that triggering run_reactive! from asynchronous contexts, i.e. pushing updates from Julia rather than polling from JS, might have advantages. The main use-case I had in mind is progress meter like functionality:

  • you're training a model with @spawn that takes a while and want to display progress.
  • you want to display tensorboard- / visdom-like plots that update as training epochs complete.

I also thought it be a simpler mental model of the reactive evaluation if the behavior when you assign to a global variable is the same in an asynchronous context as in a synchronous one. Right now one does not trigger reactive updates while the other does.

However, implementing this would be somewhat annoying. You can skip reactive updates on the Julia side if previous ones have not yet completed the way the @bind impl does, but if you start worrying about HTML+JS rendering being potentially too slow to keep up with updates being pushed from julia, things get more complicated.

And of course all this can be achieved by polling from the JS side. When pulling updates from JS, having asynchronous global assignments not trigger run_reactive! is a feature not a bug, it gives you laziness for free.

So now I'm thinking the easiest is to keep the behavior as is, and:

  • document the current asynchronous behavior in the FAQ/samples/README
  • provide a Pluto.time value that displays controls for starting/stopping time at different speeds and that users can bind to a time variable.
  • make sure the 'run all modified' button checks for global variables that have been updated asynchronously (I added a comment to issue "Run all modified" button / keyboard shortcut #83 for this)
  • maybe provide an example of how to use this time variable for progress-meter-like functionality.
@fonsp
Copy link
Owner

fonsp commented Apr 23, 2020

@malyvsen What do you think?

Here are some unstructured thoughts on this subject:

I agree with the situation of first paragraph - but I think that Running disjoint execution paths in parallel (#4) is the way to solve it, and not by adding complexity to do async code reactively. observablehq.com has this feature, which - together with other cool features - removes the need to write async code with handlers and such, because your notebook is a sequence of async handlers!! So this would be cool to have in Pluto one day, and it would solve the do-something-else-while-a-cell-is-running problem. (Right now you can open a second notebook for this.)

The philosophy for Pluto is that it adds a couple of limitations on your code, so that the reactivity is able to give you its benefits, the most important one being that coding and debugging will be cognitively simpler.

A soft limitations is that you write cells without side effects. If you do figure out a way to create side effects, then the reactivity will ignore it. @async is one of the ways to break out of Pluto's reactivity, together with Ref, array assignments, mutable structs, external functions, evals, and so on. @async actually breaks a lot more features of Pluto.

The four suggestions are all good suggestions, but the polling is actually trickier than it sounds. Pluto doesn't store old values for variables - so that they can be garbage collected. Of course, this can be done with good bookkeeping, or by hashing old values and comparing hashes.
But I don't think that this is worth the trouble: right now Pluto works using static code analysis only, which makes the project easier to maintain, and there is zero overhead while code is running.

Pluto is not supposed to be the swiss army knife for scientific programming! For full control over execution, debugging, and much more, there is Juno; for super fast, highly interactive, zero-install data visualisation there is observablehq.com; for a rogue version of Pluto, there is Jupyter.

@fonsp
Copy link
Owner

fonsp commented Apr 23, 2020

It sounds like the ML libraries use one of two non-Plutonian things to work:

  • mutable state - this is exactly what Pluto tries to prevent you from coding, since an immutable state is what makes reactivity possible.
  • intermediate results - supporting this would make things more complicated, and you lose the guarantee that a variable's value always matches its code.

But these can be avoided using the @bind macro! (Albeit by writing the code differently from normal ML things, but hey, less is more, change is good, the enemy of art is the absence of limitations, etc.) I will send some example code in a minute.

I should say that I have almost no experience with ML libraries, so do with this what you like.

@fonsp
Copy link
Owner

fonsp commented Apr 23, 2020

@kolia
Copy link
Author

kolia commented Apr 23, 2020

Thanks @fonsp for the detailed explanations and code. I think I understand your vision better now, I like it!

You're trying to go as far as possible with purely functional cells, which has many advantages, for ease of reasoning about the code, and for parallel code execution. I hadn't seen #4, that makes a lot of sense, and I hadn't thought about garbage collection.

This is making me think Dagger.jl is a good match, it could be used to make a workspace manager that executes cells in parallel, using various local or remote resources. That would introduce a significant new dependency though.

@fonsp
Copy link
Owner

fonsp commented Apr 24, 2020

Wow that might be a very useful package, I'll look into it! I guess the question is how easy it is to make it work with globals? Hopefully without rewriting the user's code?

Actually - it would be a big performance boost for Pluto if it could convert every cell into a function. Especially for notebooks that use @bind, this would mean that we don't need to eval the code every time that we need to run. Calling eval takes about 25ms per cell, which is huge compared to the actual runtime (a few microseconds for simple code). Wrapping all cells in a function is essentially https://github.com/fonsp/Pluto.jl/blob/master/sample/ui.jl#L168 but automatically.

For example, a cell like

begin
   a = b + c
   x = y
end

would be rewritten to

function cell1234(b, +, c, y)
    global a, x
    begin
        a = b + c
        x = y
    end
end

Maybe one day...

@fonsp
Copy link
Owner

fonsp commented Apr 24, 2020

Added to the FAQ 👍

@fonsp fonsp closed this as completed Apr 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants