Skip to content

Commit

Permalink
Sandbox Improvements (#219)
Browse files Browse the repository at this point in the history
Fixes #216
Fixes #210
Fixes #203
Fixes #208
Fixes #204
  • Loading branch information
cretz authored Dec 2, 2022
1 parent 6a43919 commit cc679b9
Show file tree
Hide file tree
Showing 8 changed files with 375 additions and 107 deletions.
80 changes: 66 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,9 @@ The Python SDK is under development. There are no compatibility guarantees at th
- [Sandbox is not Secure](#sandbox-is-not-secure)
- [Sandbox Performance](#sandbox-performance)
- [Extending Restricted Classes](#extending-restricted-classes)
- [Certain Standard Library Calls on Restricted Objects](#certain-standard-library-calls-on-restricted-objects)
- [is_subclass of ABC-based Restricted Classes](#is_subclass-of-abc-based-restricted-classes)
- [Compiled Pydantic Sometimes Using Wrong Types](#compiled-pydantic-sometimes-using-wrong-types)
- [Activities](#activities)
- [Definition](#definition-1)
- [Types of Activities](#types-of-activities)
Expand Down Expand Up @@ -672,6 +674,10 @@ workflow will not progress until fixed.
The sandbox is not foolproof and non-determinism can still occur. It is simply a best-effort way to catch bad code
early. Users are encouraged to define their workflows in files with no other side effects.

The sandbox offers a mechanism to pass through modules from outside the sandbox. By default this already includes all
standard library modules and Temporal modules. **For performance and behavior reasons, users are encouraged to pass
through all third party modules whose calls will be deterministic.** See "Passthrough Modules" below on how to do this.

##### How the Sandbox Works

The sandbox is made up of two components that work closely together:
Expand Down Expand Up @@ -718,23 +724,46 @@ future releases.
When creating the `Worker`, the `workflow_runner` is defaulted to
`temporalio.worker.workflow_sandbox.SandboxedWorkflowRunner()`. The `SandboxedWorkflowRunner`'s init accepts a
`restrictions` keyword argument that is defaulted to `SandboxRestrictions.default`. The `SandboxRestrictions` dataclass
is immutable and contains three fields that can be customized, but only two have notable value
is immutable and contains three fields that can be customized, but only two have notable value. See below.

###### Passthrough Modules

To make the sandbox quicker and use less memory when importing known third party libraries, they can be added to the
`SandboxRestrictions.passthrough_modules` set like so:
By default the sandbox completely reloads non-standard-library and non-Temporal modules for every workflow run. To make
the sandbox quicker and use less memory when importing known-side-effect-free third party modules, they can be marked
as passthrough modules.

**For performance and behavior reasons, users are encouraged to pass through all third party modules whose calls will be
deterministic.**

One way to pass through a module is at import time in the workflow file using the `imports_passed_through` context
manager like so:

```python
my_restrictions = dataclasses.replace(
SandboxRestrictions.default,
passthrough_modules=SandboxRestrictions.passthrough_modules_default | SandboxMatcher(access={"pydantic"}),
# my_workflow_file.py

from temporalio import workflow

with workflow.unsafe.imports_passed_through():
import pydantic

@workflow.defn
class MyWorkflow:
...
```

Alternatively, this can be done at worker creation time by customizing the runner's restrictions. For example:

```python
my_worker = Worker(
...,
workflow_runner=SandboxedWorkflowRunner(
restrictions=SandboxRestrictions.default.with_passthrough_modules("pydantic")
)
)
my_worker = Worker(..., runner=SandboxedWorkflowRunner(restrictions=my_restrictions))
```

If an "access" match succeeds for an import, it will simply be forwarded from outside of the sandbox. See the API for
more details on exact fields and their meaning.
In both of these cases, now the `pydantic` module will be passed through from outside of the sandbox instead of
being reloaded for every workflow run.

###### Invalid Module Members

Expand All @@ -749,7 +778,7 @@ my_restrictions = dataclasses.replace(
"datetime", "date", "today",
),
)
my_worker = Worker(..., runner=SandboxedWorkflowRunner(restrictions=my_restrictions))
my_worker = Worker(..., workflow_runner=SandboxedWorkflowRunner(restrictions=my_restrictions))
```

Restrictions can also be added by `|`'ing together matchers, for example to restrict the `datetime.date` class from
Expand All @@ -762,7 +791,7 @@ my_restrictions = dataclasses.replace(
children={"datetime": SandboxMatcher(use={"date"})},
),
)
my_worker = Worker(..., runner=SandboxedWorkflowRunner(restrictions=my_restrictions))
my_worker = Worker(..., workflow_runner=SandboxedWorkflowRunner(restrictions=my_restrictions))
```

See the API for more details on exact fields and their meaning.
Expand Down Expand Up @@ -802,16 +831,39 @@ To mitigate this, users should:

###### Extending Restricted Classes

Currently, extending classes marked as restricted causes an issue with their `__init__` parameters. This does not affect
most users, but if there is a dependency that is, say, extending `zipfile.ZipFile` an error may occur and the module
will have to be marked as pass through.
Extending a restricted class causes Python to instantiate the restricted metaclass which is unsupported. Therefore if
you attempt to use a class in the sandbox that extends a restricted class, it will fail. For example, if you have a
`class MyZipFile(zipfile.ZipFile)` and try to use that class inside a workflow, it will fail.

Classes used inside the workflow should not extend restricted classes. For situations where third-party modules need to
at import time, they should be marked as pass through modules.

###### Certain Standard Library Calls on Restricted Objects

If an object is restricted, internal C Python validation may fail in some cases. For example, running
`dict.items(os.__dict__)` will fail with:

> descriptor 'items' for 'dict' objects doesn't apply to a '_RestrictedProxy' object
This is a low-level check that cannot be subverted. The solution is to not use restricted objects inside the sandbox.
For situations where third-party modules need to at import time, they should be marked as pass through modules.

###### is_subclass of ABC-based Restricted Classes

Due to [https://bugs.python.org/issue44847](https://bugs.python.org/issue44847), classes that are wrapped and then
checked to see if they are subclasses of another via `is_subclass` may fail (see also
[this wrapt issue](https://github.com/GrahamDumpleton/wrapt/issues/130)).

###### Compiled Pydantic Sometimes Using Wrong Types

If the Pydantic dependency is in compiled form (the default) and you are using a Pydantic model inside a workflow
sandbox that uses a `datetime` type, it will grab the wrong validator and use `date` instead. This is because our
patched form of `issubclass` is bypassed by compiled Pydantic.

To work around, either don't use `datetime`-based Pydantic model fields in workflows, or mark `datetime` library as
passthrough (means you lose protection against calling the non-deterministic `now()`), or use non-compiled Pydantic
dependency.

### Activities

#### Definition
Expand Down
114 changes: 102 additions & 12 deletions temporalio/worker/workflow_sandbox/_importer.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
import sys
import threading
import types
import warnings
from contextlib import ExitStack, contextmanager
from typing import (
Any,
Expand All @@ -29,6 +30,7 @@
Set,
Tuple,
TypeVar,
no_type_check,
)

from typing_extensions import ParamSpec
Expand Down Expand Up @@ -107,6 +109,33 @@ def restrict_built_in(name: str, orig: Any, *args, **kwargs):
)
)

# Need to unwrap params for isinstance and issubclass. We have
# chosen to do it this way instead of customize __instancecheck__
# and __subclasscheck__ because we may have proxied the second
# parameter which does not have a way to override. It is unfortunate
# we have to change these globals for everybody.
def unwrap_second_param(orig: Any, a: Any, b: Any) -> Any:
a = RestrictionContext.unwrap_if_proxied(a)
b = RestrictionContext.unwrap_if_proxied(b)
return orig(a, b)

thread_local_is_inst = _get_thread_local_builtin("isinstance")
self.restricted_builtins.append(
(
"isinstance",
thread_local_is_inst,
functools.partial(unwrap_second_param, thread_local_is_inst.orig),
)
)
thread_local_is_sub = _get_thread_local_builtin("issubclass")
self.restricted_builtins.append(
(
"issubclass",
thread_local_is_sub,
functools.partial(unwrap_second_param, thread_local_is_sub.orig),
)
)

@contextmanager
def applied(self) -> Iterator[None]:
"""Context manager to apply this restrictive import.
Expand Down Expand Up @@ -153,17 +182,21 @@ def _import(
fromlist: Sequence[str] = (),
level: int = 0,
) -> types.ModuleType:
# We have to resolve the full name, it can be relative at different
# levels
full_name = _resolve_module_name(name, globals, level)

# Check module restrictions and passthrough modules
if name not in sys.modules:
if full_name not in sys.modules:
# Make sure not an entirely invalid module
self._assert_valid_module(name)
self._assert_valid_module(full_name)

# Check if passthrough
passthrough_mod = self._maybe_passthrough_module(name)
passthrough_mod = self._maybe_passthrough_module(full_name)
if passthrough_mod:
# Load all parents. Usually Python does this for us, but not on
# passthrough.
parent, _, child = name.rpartition(".")
parent, _, child = full_name.rpartition(".")
if parent and parent not in sys.modules:
_trace(
"Importing parent module %s before passing through %s",
Expand All @@ -174,17 +207,17 @@ def _import(
# Set the passthrough on the parent
setattr(sys.modules[parent], child, passthrough_mod)
# Set the passthrough on sys.modules and on the parent
sys.modules[name] = passthrough_mod
sys.modules[full_name] = passthrough_mod
# Put it on the parent
if parent:
setattr(sys.modules[parent], child, sys.modules[name])
setattr(sys.modules[parent], child, sys.modules[full_name])

# If the module is __temporal_main__ and not already in sys.modules,
# we load it from whatever file __main__ was originally in
if name == "__temporal_main__":
if full_name == "__temporal_main__":
orig_mod = _thread_local_sys_modules.orig["__main__"]
new_spec = importlib.util.spec_from_file_location(
name, orig_mod.__file__
full_name, orig_mod.__file__
)
if not new_spec:
raise ImportError(
Expand All @@ -195,7 +228,7 @@ def _import(
f"Spec for __main__ file at {orig_mod.__file__} has no loader"
)
new_mod = importlib.util.module_from_spec(new_spec)
sys.modules[name] = new_mod
sys.modules[full_name] = new_mod
new_spec.loader.exec_module(new_mod)

mod = importlib.__import__(name, globals, locals, fromlist, level)
Expand All @@ -219,10 +252,20 @@ def _assert_valid_module(self, name: str) -> None:
raise RestrictedWorkflowAccessError(name)

def _maybe_passthrough_module(self, name: str) -> Optional[types.ModuleType]:
if not self.restrictions.passthrough_modules.match_access(
self.restriction_context, *name.split(".")
# If imports not passed through and name not in passthrough modules,
# check parents
if (
not temporalio.workflow.unsafe.is_imports_passed_through()
and name not in self.restrictions.passthrough_modules
):
return None
end_dot = -1
while True:
end_dot = name.find(".", end_dot + 1)
if end_dot == -1:
return None
elif name[:end_dot] in self.restrictions.passthrough_modules:
break
# Do the pass through
with self._unapplied():
_trace("Passing module %s through from host", name)
global _trace_depth
Expand Down Expand Up @@ -409,3 +452,50 @@ def _get_thread_local_builtin(name: str) -> _ThreadLocalCallable:
ret = _ThreadLocalCallable(getattr(builtins, name))
_thread_local_builtins[name] = ret
return ret


def _resolve_module_name(
name: str, globals: Optional[Mapping[str, object]], level: int
) -> str:
if level == 0:
return name
# Calc the package from globals
package = _calc___package__(globals or {})
# Logic taken from importlib._resolve_name
bits = package.rsplit(".", level - 1)
if len(bits) < level:
raise ImportError("Attempted relative import beyond top-level package")
base = bits[0]
return f"{base}.{name}" if name else base


# Copied from importlib._calc__package__
@no_type_check
def _calc___package__(globals: Mapping[str, object]) -> str:
"""Calculate what __package__ should be.
__package__ is not guaranteed to be defined or could be set to None
to represent that its proper value is unknown.
"""
package = globals.get("__package__")
spec = globals.get("__spec__")
if package is not None:
if spec is not None and package != spec.parent:
warnings.warn(
"__package__ != __spec__.parent " f"({package!r} != {spec.parent!r})",
DeprecationWarning,
stacklevel=3,
)
return package
elif spec is not None:
return spec.parent
else:
warnings.warn(
"can't resolve package from __spec__ or __package__, "
"falling back on __name__ and __path__",
ImportWarning,
stacklevel=3,
)
package = globals["__name__"]
if "__path__" not in globals:
package = package.rpartition(".")[0]
return package
Loading

0 comments on commit cc679b9

Please sign in to comment.