Document behavior and usage of asynchronous global assigns #89

kolia · 2020-04-22T18:39:04Z

I think documenting asynchronous behavior is important because data science / scientific computation people are an important audience for notebooks, and in a reactive notebook a workflow that is likely to work well is having long-running computations run in the background while you continue to work on the rest of the notebook. Maybe.

I tried driving an animation from julia, using @async or @Spawn to increment a global time variable in the background. The time variable does get updated, but the asynchronous updates do not trigger reactive evaluation of dependent cells.

Originally I was thinking that triggering run_reactive! from asynchronous contexts, i.e. pushing updates from Julia rather than polling from JS, might have advantages. The main use-case I had in mind is progress meter like functionality:

you're training a model with @spawn that takes a while and want to display progress.
you want to display tensorboard- / visdom-like plots that update as training epochs complete.

I also thought it be a simpler mental model of the reactive evaluation if the behavior when you assign to a global variable is the same in an asynchronous context as in a synchronous one. Right now one does not trigger reactive updates while the other does.

However, implementing this would be somewhat annoying. You can skip reactive updates on the Julia side if previous ones have not yet completed the way the @bind impl does, but if you start worrying about HTML+JS rendering being potentially too slow to keep up with updates being pushed from julia, things get more complicated.

And of course all this can be achieved by polling from the JS side. When pulling updates from JS, having asynchronous global assignments not trigger run_reactive! is a feature not a bug, it gives you laziness for free.

So now I'm thinking the easiest is to keep the behavior as is, and:

document the current asynchronous behavior in the FAQ/samples/README
provide a Pluto.time value that displays controls for starting/stopping time at different speeds and that users can bind to a time variable.
make sure the 'run all modified' button checks for global variables that have been updated asynchronously (I added a comment to issue "Run all modified" button / keyboard shortcut #83 for this)
maybe provide an example of how to use this time variable for progress-meter-like functionality.

The text was updated successfully, but these errors were encountered:

fonsp · 2020-04-23T10:19:28Z

@malyvsen What do you think?

Here are some unstructured thoughts on this subject:

I agree with the situation of first paragraph - but I think that Running disjoint execution paths in parallel (#4) is the way to solve it, and not by adding complexity to do async code reactively. observablehq.com has this feature, which - together with other cool features - removes the need to write async code with handlers and such, because your notebook is a sequence of async handlers!! So this would be cool to have in Pluto one day, and it would solve the do-something-else-while-a-cell-is-running problem. (Right now you can open a second notebook for this.)

The philosophy for Pluto is that it adds a couple of limitations on your code, so that the reactivity is able to give you its benefits, the most important one being that coding and debugging will be cognitively simpler.

A soft limitations is that you write cells without side effects. If you do figure out a way to create side effects, then the reactivity will ignore it. @async is one of the ways to break out of Pluto's reactivity, together with Ref, array assignments, mutable structs, external functions, evals, and so on. @async actually breaks a lot more features of Pluto.

The four suggestions are all good suggestions, but the polling is actually trickier than it sounds. Pluto doesn't store old values for variables - so that they can be garbage collected. Of course, this can be done with good bookkeeping, or by hashing old values and comparing hashes.
But I don't think that this is worth the trouble: right now Pluto works using static code analysis only, which makes the project easier to maintain, and there is zero overhead while code is running.

Pluto is not supposed to be the swiss army knife for scientific programming! For full control over execution, debugging, and much more, there is Juno; for super fast, highly interactive, zero-install data visualisation there is observablehq.com; for a rogue version of Pluto, there is Jupyter.

fonsp · 2020-04-23T10:38:11Z

It sounds like the ML libraries use one of two non-Plutonian things to work:

mutable state - this is exactly what Pluto tries to prevent you from coding, since an immutable state is what makes reactivity possible.
intermediate results - supporting this would make things more complicated, and you lose the guarantee that a variable's value always matches its code.

But these can be avoided using the @bind macro! (Albeit by writing the code differently from normal ML things, but hey, less is more, change is good, the enemy of art is the absence of limitations, etc.) I will send some example code in a minute.

I should say that I have almost no experience with ML libraries, so do with this what you like.

fonsp · 2020-04-23T11:09:58Z

https://gist.github.com/fonsp/971666be05d34bfc1254d69756ac3b1f

kolia · 2020-04-23T17:53:07Z

Thanks @fonsp for the detailed explanations and code. I think I understand your vision better now, I like it!

You're trying to go as far as possible with purely functional cells, which has many advantages, for ease of reasoning about the code, and for parallel code execution. I hadn't seen #4, that makes a lot of sense, and I hadn't thought about garbage collection.

This is making me think Dagger.jl is a good match, it could be used to make a workspace manager that executes cells in parallel, using various local or remote resources. That would introduce a significant new dependency though.

fonsp · 2020-04-24T08:38:31Z

Wow that might be a very useful package, I'll look into it! I guess the question is how easy it is to make it work with globals? Hopefully without rewriting the user's code?

Actually - it would be a big performance boost for Pluto if it could convert every cell into a function. Especially for notebooks that use @bind, this would mean that we don't need to eval the code every time that we need to run. Calling eval takes about 25ms per cell, which is huge compared to the actual runtime (a few microseconds for simple code). Wrapping all cells in a function is essentially https://github.com/fonsp/Pluto.jl/blob/master/sample/ui.jl#L168 but automatically.

For example, a cell like

begin
   a = b + c
   x = y
end

would be rewritten to

function cell1234(b, +, c, y)
    global a, x
    begin
        a = b + c
        x = y
    end
end

Maybe one day...

fonsp · 2020-04-24T15:57:58Z

Added to the FAQ 👍

kolia mentioned this issue Apr 22, 2020

"Run all modified" button / keyboard shortcut #83

Closed

fonsp closed this as completed Apr 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document behavior and usage of asynchronous global assigns #89

Document behavior and usage of asynchronous global assigns #89

kolia commented Apr 22, 2020 •

edited

Loading

fonsp commented Apr 23, 2020 •

edited

Loading

fonsp commented Apr 23, 2020

fonsp commented Apr 23, 2020

kolia commented Apr 23, 2020

fonsp commented Apr 24, 2020

fonsp commented Apr 24, 2020

Document behavior and usage of asynchronous global assigns #89

Document behavior and usage of asynchronous global assigns #89

Comments

kolia commented Apr 22, 2020 • edited Loading

fonsp commented Apr 23, 2020 • edited Loading

fonsp commented Apr 23, 2020

fonsp commented Apr 23, 2020

kolia commented Apr 23, 2020

fonsp commented Apr 24, 2020

fonsp commented Apr 24, 2020

kolia commented Apr 22, 2020 •

edited

Loading

fonsp commented Apr 23, 2020 •

edited

Loading