Cells get executed using different threads #2738

dsyme · 2019-12-02T16:00:02Z

dsyme
Dec 2, 2019
Collaborator

Cells get executed using different threads

This is a design question and a possible bug.

As you execute different cells in a C# or F# notebook, a different thread gets used for each interaction. The question is really whether we want to guarantee that the same thread gets used each time you execute a new cell. Since cell execution is inherently linear it feels we should likely guarantee this.

For example, repeated re-execution of

System.Console.WriteLine("{0}", System.Threading.Thread.CurrentThread.ManagedThreadId);

gives 10, 11 etc.

This came up when using TensorFlow.NET, which had some thread local storage that was not getting re-initialized correctly as operations were performed from different threads. But the question is more general too - is is something we want to guarantee or not, and we should document it either way.

Please complete the following:

OS
- Any
Browser
- Any

jonsequitur · 2019-12-03T15:33:34Z

jonsequitur
Dec 3, 2019
Maintainer

Thinking more about this, I'm not seeing this as a bug so much as feedback pointing to two different needs we should address in the design:

People need a way to store data and share it across submissions. If you're writing an app and you understand your own threading model, thread-local storage or AsyncLocal or serialization or some other mechanism might variously be appropriate. But the notebook should abstract that in most cases, even to the degree that different submissions might execute in different processes or on different machines.

But also,

Reusing a thread is a legitimate need because of cases like the TensorFlow.NET example, so we should also allow people to control it. A magic command might be one way to approach this.

0 replies

dsyme · 2019-12-04T11:57:15Z

dsyme
Dec 4, 2019
Collaborator Author

Yup. I suspect you wouldn't lose anything if you commit to running cells on the same thread in the kernel. Kernels are not designed to perform parallel/concurrent execution of notebook cells except where mediated by async/tasks etc, so there's no reason not to dedicate one thread in the kernel to serving these requests.

The console F# Interactive does this for example. On windows each interaction is executed on a GUI thread suitable for Windows Forms or WPF programming, and on Linux we use a dedicated thread that can optionally go via a global "IEventLoop" held in the fsi object.

I'm actually a bit surprised that the FSharp.Compiler.Service.dll which dotnet-interactive uses component doesn't transfer across to the compilation/executor thread (for the Evaluate calls being made into it). It does that for some other calls. We should really do that at least for the compilation/checking part) to protect the compilation data structures from concurrent access.

0 replies

lostmsu · 2020-08-14T05:49:35Z

lostmsu
Aug 14, 2020

I am facing the same issue even within individual cells also with TensorFlow (this one is running through Python.NET though). It surfaces as multiple issues with tf.keras:

Somehow layers even within a single cell get identical names while they should be autoincremented.

var model = new Sequential(new Layer[] {
  new Flatten(kwargs: new { input_shape = (28, 28) }.AsKwArgs()),
  new Dense(units: 128, activation: tf.nn.selu_fn),
  new Dense(units: 10, activation: tf.nn.softmax_fn),
});

LostTech.Gradient.Exceptions.ValueError: The name "dense" is used 2 times in the model. All layer names should be unique.

In dotnet-script or a regular program layers get names "dense" and "dense_1"

Because in TensorFlow default graphs are per-thread, creating a model in one cell and trying to compile it in another causes "belong to different graphs" issue, because "dense_output" and "adam" end up in different graphs.

model.compile(
    optimizer: new AdamOptimizer(),
    loss: "sparse_categorical_crossentropy",
    metrics: new [] {"accuracy"});

LostTech.Gradient.Exceptions.ValueError: Tensor Tensor("dense_1_output", shape=(10,), dtype=float32) is not an element of this graph.

This is a TensorFlow behavior, which is unlikely to ever be fixed.

Right now this is a blocker for us to support our version of TensorFlow in C# notebooks on https://ml.azure.com . It used to work on F# Preview notebooks, which, I guess, used a different kernel.

0 replies

jonsequitur · 2020-08-14T13:58:31Z

jonsequitur
Aug 14, 2020
Maintainer

At its core, .NET Interactive is a library, and it's meant to support different kinds of UI hosts and web service models. For this reason, it's intentional that its threading behavior is unopinionated, except to the extent that submissions are guaranteed to execute serially with regard to the default task scheduler.

The implication of that design though is that you can specify the threading behavior if you need to. You can see a good example of this in our sample that embeds a CSharpKernel within a WPF app:

interactive/samples/connect-wpf/App.xaml.cs

Lines 79 to 90 in 5265970

    
           csharpKernel.AddMiddleware(async (KernelCommand command, KernelInvocationContext context, KernelPipelineContinuation next) => 
        
           { 
        
               if (RunOnDispatcher) 
        
               { 
        
                   await Dispatcher.InvokeAsync(async () => await next(command, context)); 
        
               } 
        
               else 
        
               { 
        
                   await next(command, context); 
        
               } 
        
           });

So let's work out what needs to happen in the notebook to satisfy TensorFlow. I'll need to find some time to take a look at the code but please chime in if this example gives you any ideas.

There's some related discussion in #711.

0 replies

lostmsu · 2020-08-14T18:58:14Z

lostmsu
Aug 14, 2020

@jonsequitur need to clarify there, for me the issue is not about the library (for which what you mentioned makes sense), but specifically about the Jupyter kernel.

0 replies

jonsequitur · 2020-08-14T22:46:52Z

jonsequitur
Aug 14, 2020
Maintainer

Understood. My comments still apply, though. If TensorFlow has an implicit dependency on a specific threading model, we should ideally fix that in the context of TensorFlow, rather than pushing this opinion into .NET Interactive's kernel, which I think will cause other issues. For example, enforcing thread affinity will likely require us to add a SynchronizationContext, which introduces the possibility of deadlocks for everyone using async.

One approach is that .NET Interactive's extension model allows us to include a threading middleware in the TensorFlow.NET NuGet package, so that the fix can a) be specific to TensorFlow and b) evolve with TensorFlow.

Again, I'll need more context to be more specific, but I'm happy to set up some time for a more detailed investigation.

0 replies

lostmsu · 2020-11-02T20:56:08Z

lostmsu
Nov 2, 2020

@jonsequitur I don't think fixing TensorFlow is a reasonable option. There's a number of thread-state based libraries, that the current threading model simply won't allow. At least OpenGL comes to mind.

At the very least there should be a option to force all the interactive commands to run in a single thread.

0 replies

jonsequitur · 2020-11-02T23:43:23Z

jonsequitur
Nov 2, 2020
Maintainer

Thread affinity is something that .NET Interactive isn't opinionated about so that the behavior can be customized for the needs of a specific use case. You can do this with middleware, which can be added using an extension.

So I'm not suggesting changing how TensorFlow or other libraries work, but rather adding kernel extensions to their NuGet packages that customize the behavior of .NET Interactive's threading model in the presence of those libraries.

Another more generalized approach might be to introduce a magic command that specifies the threading behavior required by the notebook.

0 replies

lostmsu · 2020-11-03T02:00:43Z

lostmsu
Nov 3, 2020

@jonsequitur in case of Jupyter kernel, isn't this code part of this repository too?

0 replies

jonsequitur · 2020-11-03T06:57:08Z

jonsequitur
Nov 3, 2020
Maintainer

Yes. We'd like the behavior to be the same regardless of whether you're using .NET Interactive with Jupyter, though, so I don't see this issue as specific to Jupyter.

0 replies

lostmsu · 2020-11-20T05:13:04Z

lostmsu
Nov 20, 2020

BTW, this appears to be affecting ML.NET incarnation of TensorFlow too: #60

0 replies

dsyme · 2021-04-14T11:10:02Z

dsyme
Apr 14, 2021
Collaborator Author

So I'm not suggesting changing how TensorFlow or other libraries work, but rather adding kernel extensions to their NuGet packages that customize the behavior of .NET Interactive's threading model in the presence of those libraries.

I don't think this is realistic. The TensorFlow threading model is not particularly strange and it's not possible to go round patching up the package ecosystem to include dependencies on .NET Interactive

We should just make it possible to configure .NET Interactive to respect a single-threaded execution model without any changes to package ecosystems. A magic command would be enough.

@lostmsu We should work together to get a proposed fix for this I think.

0 replies

lostmsu · 2021-04-14T17:53:01Z

lostmsu
Apr 14, 2021

Yes, a "magic command" sounds good. I think the .NET way would be to expose an option to set a custom TaskScheduler, along with an appropriate scheduler implementation.

At very least it needs to be available in the Jupyter kernel.

0 replies

jonsequitur · 2021-04-19T19:56:12Z

jonsequitur
Apr 19, 2021
Maintainer

Forcing all cell execution to a single thread introduces one significant gotcha which we'll want to figure out how to minimize. In order for async/await to work correctly at that point, you'll also need a custom SynchronizationContext. This in turn introduces the risk of deadlocks in code (including library code) that uses Task.Result or Task.Wait.

One approach to minimizing it might be to set the synchronization context only during the execution of specified cells, instead of changing the behavior for the whole notebook.

For example, something like:

#!thread my-thread-name

// ... do thread-affinitized things

0 replies

lostmsu · 2021-04-20T07:21:27Z

lostmsu
Apr 20, 2021

Either I am misunderstanding something, or library code should not be affected. All new tasks started from interactive should use standard SynchronizationContext and can execute on any worker thread. Single thread should only affect the top-level call.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cells get executed using different threads #2738

{{title}}

Replies: 15 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Cells get executed using different threads #2738

dsyme Dec 2, 2019 Collaborator

Cells get executed using different threads

Please complete the following:

Replies: 15 comments

jonsequitur Dec 3, 2019 Maintainer

dsyme Dec 4, 2019 Collaborator Author

lostmsu Aug 14, 2020

jonsequitur Aug 14, 2020 Maintainer

lostmsu Aug 14, 2020

jonsequitur Aug 14, 2020 Maintainer

lostmsu Nov 2, 2020

jonsequitur Nov 2, 2020 Maintainer

lostmsu Nov 3, 2020

jonsequitur Nov 3, 2020 Maintainer

lostmsu Nov 20, 2020

dsyme Apr 14, 2021 Collaborator Author

lostmsu Apr 14, 2021

jonsequitur Apr 19, 2021 Maintainer

lostmsu Apr 20, 2021

dsyme
Dec 2, 2019
Collaborator

jonsequitur
Dec 3, 2019
Maintainer

dsyme
Dec 4, 2019
Collaborator Author

lostmsu
Aug 14, 2020

jonsequitur
Aug 14, 2020
Maintainer

lostmsu
Aug 14, 2020

jonsequitur
Aug 14, 2020
Maintainer

lostmsu
Nov 2, 2020

jonsequitur
Nov 2, 2020
Maintainer

lostmsu
Nov 3, 2020

jonsequitur
Nov 3, 2020
Maintainer

lostmsu
Nov 20, 2020

dsyme
Apr 14, 2021
Collaborator Author

lostmsu
Apr 14, 2021

jonsequitur
Apr 19, 2021
Maintainer

lostmsu
Apr 20, 2021