diff --git a/content/post/mini-racer-v0.12.2.md b/content/post/mini-racer-v0.12.2.md index 291b374..c9992f1 100644 --- a/content/post/mini-racer-v0.12.2.md +++ b/content/post/mini-racer-v0.12.2.md @@ -18,13 +18,14 @@ The dynamic language embedding [turducken](https://en.wikipedia.org/wiki/Turducken) is complete: you can now call Python from JavaScript from Python! -By writing logic in Python and exposing it to V8's JavaScript sandbox, you can -now in theory, while writing _only Python and JavaScript_, create your own JS -extension library similar to -[NodeJS's standard library](https://nodejs.org/docs/latest/api/).Obviously, +As of [PyMiniRacer](https://github.com/bpcreech/PyMiniRacer) `v0.12.2`, you can +write logic in Python and expose it to V8's JavaScript sandbox. You can now in +theory, while writing _only Python and JavaScript_, create your own JS extension +library similar to +[NodeJS's standard library](https://nodejs.org/docs/latest/api/). Obviously, anything created this way this will almost certainly be less extensive, less -standardized, and less efficient than the NodeJS standard library; but it will -be _more tailored to your use case_. +standardized, and less efficient than the NodeJS standard library; but it _will_ +be _more tailored to your needs_. This post follows up on prior content related to [reviving PyMiniRacer](https://github.com/bpcreech/PyMiniRacer) and then @@ -42,9 +43,11 @@ by default, both simplicity and [security](https://bpcreech.com/PyMiniRacer/architecture/#security-goals). Thus, until now, you couldn't, say, go and grab a random URL off the Internet, -or even log to stdout (e.g., using `console.log`). Neither is bundled with V8. +or even log to stdout (e.g., using `console.log`). No APIs for either operation +are bundled with V8. -Well, now you can extend PyMiniRacer to add that functionality yourself! +Well, now you can use PyMiniRacer to extend the JavaScript API, thereby adding +that functionality yourself! ```python import aiohttp @@ -120,8 +123,8 @@ V8 runs JavaScript in a single-threaded fashion, and doing anything synchronous in a callback to Python would block the entire V8 isolate. Worse, things would likely deadlock very quickly—if your Python extension function tries to call back into V8 it will freeze. The only thing we can reasonably do, when -JavaScript calls outside the V8 sandbox, is create a `Promise` and do the work -to fulfill that `Promise` out of band. +JavaScript calls outside the V8 sandbox, is create and return a `Promise`, and +then do the work actually needed to _fulfill_ that `Promise` out of band. ## Internal changes to PyMiniRacer @@ -129,8 +132,8 @@ to fulfill that `Promise` out of band. I put some general design ideas about `wrap_py_function` [here](https://github.com/bpcreech/PyMiniRacer/issues/39). What landed in -PyMiniRacer is basically -["Alternate implementation idea 3"](https://github.com/bpcreech/PyMiniRacer/issues/39#issuecomment-2043984826). +PyMiniRacer is basically "Alternate implementation idea 3" +[on the GitHub issue outlining this feature](https://github.com/bpcreech/PyMiniRacer/issues/39#issuecomment-2043984826). Generally speaking: #### Generalizing PyMiniRacer callbacks and sharing allocated objects with V8 @@ -140,7 +143,7 @@ await `Promise` objects. We just had to generalize it! ... Which specifically means allowing V8 to _reuse_ a callback. With `Promises`, callbacks are only used 0.5 times on average (either the `resolve` or `reject` -is used exactly once). PyMiniRacer handled cleanup of these on its own; the +is used, exactly once). PyMiniRacer handled cleanup of these on its own; the first time either `resolve` or `reject` was called, PyMiniRacer could destroy both callbacks. This technique was borrowed from [V8's d8 tool](https://github.com/v8/v8/blob/0f719663da59ee690f3b72c520f8f9ea2328071a/src/d8/d8.cc#L1493). @@ -152,29 +155,30 @@ and hand that off to V8. Unfortunately, [V8 is pretty apathetic about telling us when it's _done with_ an external C++ object we give it](https://stackoverflow.com/questions/24107280/v8-weakcallback-never-gets-called). -I couldn't make it work at all, and all commentary I can find on the matter says -trying to get V8 `MakeWeak` and finalizers work reliably is a fool's errand. -This creates a memory management conundrum: we need to create a small bit of -per-callback C++ state and hand it to V8, but V8 won't reliably tell us when it -has finally dropped all references to that state. +I couldn't make the V8 finalizer callback work at all, and all commentary I can +find on the matter says trying to get V8 `MakeWeak` and finalizers work reliably +is a fool's errand. This creates a memory management conundrum: we need to +create a small bit of per-callback C++ state and hand it to V8, but V8 won't +reliably tell us when it has finally dropped all references to that state. So I moved to a model of creating _one such callback object per MiniRacer -context_. We can tear _that_ object down when tearing down the whole MiniRacer -context, rather than relying on V8 to tell us when it's done with the data we -gave it. In order to multiplex many callbacks into that object, we can just give -JavaScript an ID number (because V8 _does_ manage to reliably destruct -numbers!). This new logic lives in +context_. This object outlives the `v8::Isolate`, so we're no longer relying on +V8 to tell us when it's done with the pointer we gave it. In order to multiplex +many callbacks through that one object, we can just give JavaScript an ID number +(because V8 _does_ manage to reliably destruct numbers!). This new logic lives +in [`MiniRacer::JSCallbackMaker`](https://github.com/bpcreech/PyMiniRacer/blob/release/v0.12.2/src/v8_py_frontend/js_callback_maker.h). Meanwhile the Python side of things can manage its own map of ID number to callback, and remove entries from that map when it wants to. (This is what the `wrap_py_function` context manager does on `__exit__`). Since Python is tearing down the callback and its ID on its own schedule, in total ignorance of -JavaScript's references to that callback ID, it's possible for JS to keep trying -to call the callback after Python already tore it down. This is working as -designed; such calls can be easily spotted because the `callback_caller_id` or -`callback_id` don't reference an active `CallbackCaller` or callback (see -diagram below), and they can thus be easily ignored. +JavaScript's references to that same callback ID, it's possible for JS to keep +trying to call the callback after Python already tore it down. This is working +as designed; such calls can be easily spotted when the either +`callback_caller_id` or `callback_id` doesn't reference an active +`CallbackCaller` or callback, respectively (see diagram below). Calls to +invalidated ID numbers can be easily and safely ignored. I think this could be turned into a generalized strategy for safely sharing allocated C++ memory with V8: @@ -184,14 +188,17 @@ allocated C++ memory with V8: 2. If you want to be able to delete any objects _before_ the `v8::Isolate` exits, don't share _raw_ pointers and references. Instead: 1. On the C++ side, create an ID-number-to-pointer map, and give V8 only ID - numbers. - 2. Create an API, whether in C++ or JavaScript (or Python which calls C++ in - PyMiniRacer's case) which authoratively tears down the object and the + numbers, as the `data` argument of `v8::Function::New`. + 2. Your `v8::Function` callback implementation can read the ID number and + safely convert it to a C++ pointer by looking it up in the map. + 3. Design an API, whether in C++ or JavaScript (or Python which calls C++ in + PyMiniRacer's case) which authoratively tears down the C++ object and the ID-number-to-pointer map entry. (Don't rely on V8 to tell you when all references to the object have dropped; it won't do this reliably.) - 3. Be prepared for JavaScript code to try and use the ID after you've torn - down the object. The C++ code can detect this (because the ID is not in - the map), and safely reject such attempts. + 4. Because we aren't tracking dangling IDs on the JavaScript side (we + can't!), be prepared for JavaScript code to try and use the ID after + you've torn down the object. The C++ code can easily detect this (because + the ID is not in the map), and safely reject such attempts. #### System diagram @@ -340,15 +347,19 @@ manage object lifecycles. As of PyMiniRacer `v0.11.1` we were using a DAG of `std::shared_ptr` references to manage lifecycle of a dozen different classes. I discovered [a bug](https://github.com/bpcreech/PyMiniRacer/issues/62) that -this logic left behind. The problem with the laissez-faire -`std::shared_ptr`-all-the-things memory management pattern is that we leak -memory when we have reference cycles, and unfortunately we do have reference -cycles in PyMiniRacer. In particular, we create and throw function closures onto -the `v8::Isolate` message loop which contain `shared_ptr` references to -objects... which eventually contain `shared_ptr` references to the `v8::Isolate` -itself: a reference cycle! (We also had some bare references _not_ managed by a -`shared_ptr`, which would result in a use-after-free in hard-to-reach situations -during teardown of PyMiniRacer.) +this logic left behind. If JavaScript code launched never-ending background +work, (e.g., `setTimeout(() => { while (1) {} }, 100)`), the C++ +`MiniRacer::Context` and its `v8::Isolate` wouldn't actually shut down when told +to by Python side. + +The problem with the laissez-faire `std::shared_ptr`-all-the-things memory +management pattern is that we leak memory when we have reference cycles. +Unfortunately we _do_ have reference cycles in PyMiniRacer. In particular, all +over PyMiniRacer's C++ codebase, we create and throw function closures onto the +`v8::Isolate` message queue which contain `shared_ptr` references to objects... +which transitively contain `shared_ptr` references to the `v8::Isolate` +itself... which contains references to everything in the message queue: a +reference cycle! A simple way to resolve that in _another_ system might be to explicitly clear out the `v8::Isolate`'s message queue on shutdown. Basically, by wiping out all @@ -358,18 +369,18 @@ message queue even for teardown: we put tasks on the message queue which delete C++ objects, because it's the easiest way to ensure they're deleted under the `v8::Isolate` lock (discussion of that [here](https://groups.google.com/g/v8-users/c/glG3-3pufCo)). So PyMiniRacer -can't simply clear out the message, or it will leak C++ objects on exit. +can't simply clear out the message queue, or it will leak C++ objects on exit. After playing with this some and failing to get the lifecycles just right, and struggling to even understand the effective teardown order of dozens of different reference-counted objects, I realized we could switch to a different, easier-to-think-about pattern: every object putting tasks onto the `v8::Isolate` -simply needs to ensure those tasks complete before it _anything it puts into -those tasks_ is torn down. +simply needs to ensure those tasks complete before _anything it puts into those +tasks_ is torn down. I'll restate that rule: _if you call `MiniRacer::IsolateManager::Run(xyz)`, you -had better ensure that task is done **before** you destruct any objects you -bound into the function closure `xyz`._ +are reponsible for ensuring that task is done **before** any objects you bound +into the function closure `xyz` are destroyed._ This rule seems obvious in restrospect! But it's hard to implement. I: @@ -392,3 +403,6 @@ This rule seems obvious in restrospect! But it's hard to implement. I: [`IsolateObjectCollector`](https://github.com/bpcreech/PyMiniRacer/blob/99591c962b219251d309e0f5899b3cf2a676ab9e/src/v8_py_frontend/isolate_object_collector.h#L13) to ensure all the garbage is gone before we move on to tearing down the `IsolateManager` and its message pump loop. + +This is a little more code than we had before, but things are much simpler to +reason about now!