diff --git a/.RFC-0000-template.md.swp b/.RFC-0000-template.md.swp deleted file mode 100644 index 266d938..0000000 Binary files a/.RFC-0000-template.md.swp and /dev/null differ diff --git a/RFC-0026-logging-system.md b/RFC-0026-logging-system.md index bc12cab..cfd8cd6 100644 --- a/RFC-0026-logging-system.md +++ b/RFC-0026-logging-system.md @@ -1,57 +1,47 @@ -# PyTorch Logging System +# New PyTorch Logging System ## **Summary** Create a message logging system for PyTorch with the following requirements: -* All errors, warnings, and other messages generated by PyTorch should be - emitted using the the logging system API - -* The APIs for emitting messages and changing settings should all be consistent - between C++ and Python +### Consistency -* Offer different message severity levels, including at least the following: +* The C++ and Python APIs should match each other as closely as possible. - - **Info**: Emits a message without creating a warning or error. By default, - this gets printed to stdout +* All errors, warnings, and other messages generated by PyTorch should be + emitted using the the logging system API. - - **Warning**: Emits a message as a warning. By default, this will turn into - a Python warning - - **Error**: Emits a message as an error. By default, this will turn into - a Python error +### Severity level and message classes - - TODO: Should we also have a **Fatal** severity for integration with - Meta's internal logging system? A fatal message terminates the program +* Offer different message severity levels, including at least the following: -* Offer different classes of messages, including at least the following: + - **Info**: Emits a message without creating a warning or error. By default, + this gets printed to stdout. - - **Default**: A catch-all message class + - **Warning**: Emits a message as a warning. If a warning is never caught, + it gets printed to stderr by default. - - **Nondeterministic**: Emitted when `torch.use_deterministic_algorithms(True)` - is set and a nondeterministic operation is called + - **Error**: Emits a message as an error. If an error is never caught, the + application will print the error to stderr and quit. - - **Deprecated**: Emitted when a deprecated function is called + - TODO: Do we also need a **Fatal** severity for integration with Meta's + internal logging system (glog)? A fatal message terminates the program - - **Beta**: Emitted when a beta feature is called. See - [PyTorch feature classifications](https://pytorch.org/blog/pytorch-feature-classification-changes/) +* Offer different message classes under each severity level. - - **Prototype**: Emitted when a prototype feature is called. See - [PyTorch feature classifications](https://pytorch.org/blog/pytorch-feature-classification-changes/) + - Every message is emitted as an instance of a message class. - - TODO: Should all the classic Python errors and warnings (`TypeError`, - `ValueError`, `NotImplementedError`, `DeprecationWarning`, etc) have their - own message class? Or are those separate from our concept of a message - class, and any message class is allowed to raise any Python exception or - warning type? + - Each message class has both a C++ class and a Python class, and when a + C++ message is propagated to Python, it is converted to its corresponding + Python class. -* Continue using warning/error APIs that currently exist in PyTorch wherever - possible. For instance, `TORCH_CHECK`, `TORCH_WARN`, and `TORCH_WARN_ONCE` - should continue to be used in C++ + - Whenever it makes sense, the Python class should be one of the builtin + Python error/warning classes. For instance, currently in PyTorch, the C++ + error class `c10::Error` gets converted to the Python `RuntimeError` class. - - NOTE: These existing APIs don't currently have a concept of message classes, - so that will need to be added +* Adding new message classes and severity levels should be easy -* Creating new message classes and severity levels should be easy +### Configurability and filtering * Ability to turn warnings into errors. This is already possible with the Python `warnings` module filter, but the PyTorch docs should mention it and @@ -60,13 +50,9 @@ Create a message logging system for PyTorch with the following requirements: * Settings to disable specific message classes and severity levels - - TODO: Most errors should not be disableable, right? Perhaps only - some message classes should allow disabling or downgrading errors. For - instance, currently in PyTorch, we can downgrade a nondeterministic error - to a warning, but we wouldn't want to downgrade an error from invalid - arguments given to an operation. + - TODO: Error classes should never be disableable, right? - - Disabling warnings in Python should already be possible with the `warnings` + - Disabling warnings in Python is already possible with the `warnings` module filter. See [documentation](https://docs.python.org/3/library/warnings.html#the-warnings-filter). There is no similar system in C++ at the moment, and building one is probably low priority. @@ -75,18 +61,21 @@ Create a message logging system for PyTorch with the following requirements: excessive printouts can degrade the user experience. Related to issue [#68768](https://github.com/pytorch/pytorch/issues/68768) -* Settings to avoid emitting duplicate messages generated by multiple +* Settings to enable/disable emitting duplicate messages generated by multiple `torch.distributed` ranks. Related to issue [#68768](https://github.com/pytorch/pytorch/issues/68768) * Ability to make a particular warning only warn once. Warn-once should be the - default in most cases. + default for most warnings. - - NOTE: Currently `TORCH_WARN_ONCE` does this in C++, but there is no Python + - Currently `TORCH_WARN_ONCE` does this in C++, but there is no Python equivalent - - TODO: Should there be a setting to turn a warn-once into a warn-always for - a given message class and vice versa? + - TODO: `torch.set_warn_always()` currently controls some warnings (maybe + only the ones from C++? I need to find out for sure.) + + - TODO: Should there be a setting to turn a warn-once into a warn-always and + vice versa for an entire message class? * Settings can be changed from Python, C++, or environment variables @@ -94,6 +83,8 @@ Create a message logging system for PyTorch with the following requirements: remain possible. For instance, the following turns a `DeprecationWarning` into an error: `python -W error::DeprecationWarning your_script.py` +### Compatibility + * Should integrate with Meta's internal logging system, which is [glog](https://github.com/google/glog) @@ -102,12 +93,19 @@ Create a message logging system for PyTorch with the following requirements: * Must be OSS-friendly, so it shouldn't require libraries (like glog) which may cause incompatibility issues for projects that use PyTorch +### Other requirements + +* Continue using warning/error APIs and message classes that currently exist in + PyTorch wherever possible. For instance, `TORCH_CHECK`, `TORCH_WARN`, and + `TORCH_WARN_ONCE` should continue to be used in C++ + * TODO: Determine the requirements for the following concepts: - - Log files (default behavior and any settings) + - Log files? (default behavior and any settings) ## **Motivation** + Original issue: [link](https://github.com/pytorch/pytorch/issues/72948) Currently, it is challenging for PyTorch developers to provide messages that @@ -116,5 +114,368 @@ act consistently between Python and C++. It is also challenging for PyTorch users to manage the messages that PyTorch emits. For instance, if a PyTorch user happens to be calling PyTorch functions that emit lots of messages, it can be difficult for them to filter out those -messages so that their project's users don't get bombarded with warnings that -they don't need to see. +messages so that their project's users don't get bombarded with warnings and +printouts that they don't need to see. + + +## **Proposed Implementation** + +### Message classes + +At least the following message classes should be available. The name of the +C++ class appears first in all the listed entries below, with the Python class +to the right of it. + +Each severity level has a default class. All other classes within a given +severity level inherit from the corresponding default class. + +NOTE: Most of the error classes below already exist in PyTorch. However, +info classes do not currently exist. Also, only one type of warning currently +exists in C++, and it is not implemented as a C++ class that can be inherited +(as far as I understand). + +#### Error message classes: + +* **`c10::Error`** - Python `RuntimeError` + - Default error class. Other error classes inherit from it. + +* **`c10::IndexError`** - Python `IndexError` + - Emitted when attempting to access an element that is not present in + a list-like object. + +* **`c10::ValueError`** - Python `ValueError` + - Emitted when a function receives an argument with correct type but + incorrect value. + +* **`c10::TypeError`** - Python `TypeError` + - Emitted when a function receives an argument with incorrect type. + +* **`c10:NotImplementedError`** - Python `NotImplementedError` + - Emitted when a feature that is not implemented is called. + +* **`c10::LinAlgError`** - Python `torch.linalg.LinAlgError` + - Emitted from the `torch.linalg` module when there is a numerical error. + +* **`c10::NondeterministicError`** - Python `torch.NondeterministicError` + - Emitted when `torch.use_deterministic_algorithms(True)` and + `torch.set_deterministic_debug_mode('error')` are set, and a + nondeterministic operation is called. + + +#### Warning message classes: + +* **`c10::UserWarning`** - Python `UserWarning` + - Default warning class. Other warning classes inherit from it. + +* **`c10::BetaWarning`** - Python `torch.BetaWarning` + - Emitted when a beta feature is called. See + [PyTorch feature classifications](https://pytorch.org/blog/pytorch-feature-classification-changes/). + +* **`c10::PrototypeWarning`** - Python `torch.PrototypeWarning` + - Emitted when a prototype feature is called. See + [PyTorch feature classifications](https://pytorch.org/blog/pytorch-feature-classification-changes/). + +* **`c10::NondeterministicWarning`** - Python `torch.NondeterministicWarning` + - Emitted when `torch.use_deterministic_algorithms(True)` and + `torch.set_deterministic_debug_mode('warn')` are set, and a + nondeterministic operation is called. + +* **`c10::DeprecationWarning`** - Python `DeprecationWarning` + - Emitted when a deprecated function is called. + - TODO: `DeprecationWarning`s are ignored by default in Python, so we may + actually want to use a different Python class for this. + + +#### Info message classes: + +* **`c10::Info`** - Python `torch.Info` + - Default info class. Other info classes inherit from it. + + +### Message APIs + +In order to emit messages, developers can use the APIs defined in this section. + +These APIs all have a variable length argument list, `...` in C++ and `*args` +in Python. When a message is emitted, these arguments are concatenated into +a string, and the string becomes the body of the message. In C++, the arguments +must all have the `std::ostream& operator<<` function defined so that they can +be concatenated, and in Python, they must all have a `__str__` function. + +#### Error APIs + +The APIs for raising errors all check a boolean condition, the `cond` argument +in the following signatures, and throw an error if that condition is false. + +The error APIs are listed below, with the C++ signature on the left and the +corresponding Python signature on the right. + +**`TORCH_CHECK(cond, ...)`** - `torch.check(cond, *args)` + - C++ error: `c10::Error` + - Python error: `RuntimeError` + +**`TORCH_CHECK_INDEX(cond, ...)`** - `torch.check_index(cond, *args)` + - C++ error: `c10::IndexError` + - Python error: `IndexError` + +**`TORCH_CHECK_VALUE(cond, ...)`** - `torch.check_value(cond, *args)` + - C++ error: `c10::ValueError` + - Python error: `IndexError` + +**`TORCH_CHECK_TYPE(cond, ...)`** - `torch.check_type(cond, *args)` + - C++ error: `c10::TypeError` + - Python error: `TypeError` + +**`TORCH_CHECK_NOT_IMPLEMENTED(cond, ...)`** - `torch.check_not_implemented(cond, *args)` + - C++ error: `c10::NotImplementedError` + - Python error: `NotImplementedError` + +**`TORCH_CHECK_WITH(error_t, cond, ...)`** - `torch.check_with(error_type, cond, *args)` + - C++ error: Specified by `error_t` argument + - Python error: Specified by `error_type` argument + + +#### Warning APIs + +**`TORCH_WARN(...)`** - `torch.warn(*args)` + - C++ warning: `c10::Warning` + - Python warning: `UserWarning` + +**`TORCH_WARN_ONCE(...)`** - `torch.warn_once(*args)` + - C++ warning: `c10::Warning` + - Python warning: `UserWarning` + - For a given callsite, the warning is emitted only upon the first time it is + called. + +**`TORCH_WARN_WITH(warning_t, ...)`** - `torch.warn_with(warning_type, ...)` + - C++ warning: Specified by `warning_t` argument + - Python warning: Specified by `warning_type` argument + +**`TORCH_WARN_ONCE_WITH(warning_t, ...)`** - `torch.warn_with(warning_type, ...)` + - C++ warning: Specified by `warning_t` argument + - Python warning: Specified by `warning_type` argument + - For a given callsite, the warning is emitted only upon the first time it is + called. + +TODO: In C++, `TORCH_WARN_ONCE` is implemented as a macro that defines a local +static variable to track whether the warning has been emitted from each +callsite. It is not possible to implement it this way in Python, so need to +think of some other way to do it. Of course the Python `warnings` module's +[`"default"` filter](https://docs.python.org/3/library/warnings.html#the-warnings-filter) +prevents duplicate warnings from being emitted, but it acts a little +differently--if two warning messages emitted from the same location differ even +slightly (for instance, if the value of some variable is included in the +message and that value differs between two different `warnings.warn` calls), +then both warnings are emitted. `TORCH_WARN_ONCE` does not check whether +messages differ. But we could probably implement `torch.warn_once` in a similar +way to how the `warnings` module filter is implemented. + + +#### Info APIs + +Just like the error and warning APIs, the info APIs each have a variable length +argument list, `...` in C++ and `*args` in Python. These arguments are +concatenated into the info message. + +**`TORCH_LOG_INFO(...)`** - `torch.log_info(*args)` + - C++ info class: `c10::Info` + - Python warning: `torch.Info` + - TODO: Is there a better name than `log_info`? I didn't want to call it + `torch.info`, because + [`numpy.info`](https://numpy.org/doc/stable/reference/generated/numpy.info.html) + has a completely different functionality. And obviously + [`torch.log`](https://pytorch.org/docs/stable/generated/torch.log.html?highlight=torch%20log#torch.log) + is already taken. + +**`TORCH_LOG_INFO_WITH(info_t, ...)`** - `torch.log_info_with(info_type, *args)` + - C++ info class: Specified by `info_t` argument + - Python info class: Specified by `info_type` argument + + +### Multi-process messaging APIs + +Currently, when running subprocesses that use PyTorch, some messages are +emitted by every running subprocess. See +[issue #68768](https://github.com/pytorch/pytorch/issues/68768) for specific +examples. Avoiding emitting duplicate messages from each subprocess by default +would give a better user experience. + +In issue #68768, the duplicate messages related to `cpp_extension.load` can be +modified to only be emitted by subprocess rank 0, simply by checking the node's +rank first. For instance, where there is a `warnings.warn(...)`, call we can +replace with: + +```python +if rank == 0: + warnings.warn(...) +``` + +This successfully avoids duplicate warnings. A few concrete examples can be +seen in [this draft PR](https://github.com/pytorch/pytorch/pull/79288). + +However, implementing the duplicate filter like this is not ideal. It would be +better to have dedicated message system API calls for this. In the case of +warnings, the following signature could be used: + +**`torch.warn_rank(my_rank, *args, warn_rank=0)`** + * Args: + - `my_rank` - Rank of the subprocess calling this function + - `args` - Warning message + - `warn_rank` - Rank that should emit the message + * The warning is only emitted if `my_rank == warn_rank` + +TODO: Add APIs for the rest of the message classes, like +`torch.log_info_rank()`, etc. + +TODO: There should also be a global setting to enable emitting the duplicates. +`torch.warn_rank` could check the setting, and if it's turned on, then it would +emit the warning for all ranks. + +TODO: Should we have a `TOCH_WARN_RANK` (and others) in C++ as well? Is there +an existing use case for it? + + +### Other details + +At the moment in PyTorch, the Python `warnings` module is being publicly +included in `torch` as `torch.warnings`. This should probably be removed or +renamed to `_warnings` to avoid confusion. + + +# PyTorch's current messaging API + +The rest of this document contains details about the current messaging API in +PyTorch. This is included to give better context about what will change and +what will stay the same in the new messaging system. + +At the moment, PyTorch has some APIs in place to make a lot of aspects of +message logging easy, from the perspective of a developer working on PyTorch. +Messages can be either printouts, warnings, or errors. + +Errors are created with the standard `raise` statement in Python +([documentation](https://docs.python.org/3/tutorial/errors.html#raising-exceptions)). +In C++, PyTorch offers macros for creating errors (which are listed later in +this document). When a C++ function propagates to Python, any errors that were +generated get converted to Python errors. + +Warnings are created with `warnings.warn` in Python +([documentation](https://docs.python.org/3/library/warnings.html)). In C++, +PyTorch offers macros for creating warnings (which are listed later in this +document). When a C++ function propagates to Python, any warnings that were +generated get converted to Python warnings. + +Printouts (or what is called "Info" severity messages in the new system) are +created with just `print` in Python and `std::cout` in C++. + +PyTorch's C++ warning/error macros are declared in +[`c10/util/Exception.h`](https://github.com/pytorch/pytorch/blob/72e4aab74b927c1ba5c3963cb17b4c0dce6e56bf/c10/util/Exception.h). + +## PyTorch C++ Errors + +In C++, there are several different types of errors that can be used, but +PyTorch developers typically don't deal with these error classes directly. +Instead, they use macros that offer a concise interface for raising different +error classes. + +### C++ error macros + +Each of the error macros evaluate a boolean conditional expression, `cond`. If +the condition is false, the error is raised, and whatever extra arguments are +in `...` get concatenated into the error message with `operator<<`. + +| Macro | C++ Error class | +| ---------------------------------------- | ------------------------------ | +| `TORCH_CHECK(cond, ...)` | `c10::Error` | +| `TORCH_CHECK_WITH(error_t, cond, ...)` | caller specifies `error_t` arg | +| `TORCH_CHECK_LINALG(cond, ...)` | `c10::LinAlgError` | +| `TORCH_CHECK_INDEX(cond, ...)` | `c10::IndexError` | +| `TORCH_CHECK_VALUE(cond, ...)` | `c10::ValueError` | +| `TORCH_CHECK_TYPE(cond, ...)` | `c10::TypeError` | +| `TORCH_CHECK_NOT_IMPLEMENTED(cond, ...)` | `c10::NotImplementedError` | + +There is some documentation on error macros [here](https://github.com/pytorch/pytorch/blob/72e4aab74b927c1ba5c3963cb17b4c0dce6e56bf/c10/util/Exception.h#L344-L362) + +The reason why C++ preprocessor macros are used, rather than function calls, is +to ensure that the compiler can optimize for the `cond == true` branch. In +other words, if an error does not get raised, overhead is minimized. + +### C++ error classes + +The primary error class in C++ is `c10::Error`. Documentation and declaration +are +[here](https://github.com/pytorch/pytorch/blob/72e4aab74b927c1ba5c3963cb17b4c0dce6e56bf/c10/util/Exception.h#L21-L28). +`c10::Error` is a subclass of `std::exception`. + +There are other error classes which are child classes of `c10::Error`, defined +[here](https://github.com/pytorch/pytorch/blob/72e4aab74b927c1ba5c3963cb17b4c0dce6e56bf/c10/util/Exception.h#L195-L236). + +When these errors propagate to Python, they are each converted to a different +Python error class: + +| C++ error class | Python error class | +| ------------------------------- | -------------------------- | +| `c10::Error` | `RuntimeError` | +| `c10::IndexError` | `IndexError` | +| `c10::ValueError` | `ValueError` | +| `c10::TypeError` | `TypeError` | +| `c10::NotImplementedError` | `NotImplementedError` | +| `c10::EnforceFiniteError` | `ExitException` | +| `c10::OnnxfiBackendSystemError` | `ExitException` | +| `c10::LinAlgError` | `torch.linalg.LinAlgError` | + + +## PyTorch C++ Warnings + +When warnings propagate from C++ to Python, they are converted to a Python +`UserWarning`. Whatever is in `...` will get concatenated into the warning +message using `operator<<`. + +* `TORCH_WARN(...)` + - [Definition](https://github.com/pytorch/pytorch/blob/72e4aab74b927c1ba5c3963cb17b4c0dce6e56bf/c10/util/Exception.h#L515-L530) + +* `TORCH_WARN_ONCE(...)` + - [Definition](https://github.com/pytorch/pytorch/blob/72e4aab74b927c1ba5c3963cb17b4c0dce6e56bf/c10/util/Exception.h#L557-L562) + - This macro only generates a warning the first time it is encountered during + run time. + + +## Implementation details + +### C++ to Python Error Translation + +`c10::Error` and its subclasses are translated into their corresponding Python +errors [in `CATCH_CORE_ERRORS`](https://github.com/pytorch/pytorch/blob/72e4aab74b927c1ba5c3963cb17b4c0dce6e56bf/torch/csrc/Exceptions.h#L54-L100). + +However, not all of the `c10::Error` subclasses in the table above appear here. +I'm not sure yet what's up with that. + +`CATCH_CORE_ERRORS` is included within the `END_HANDLE_TH_ERRORS` macro that +every Python-bound C++ function uses for handling errors. For instance, +`THPVariable__is_view` uses the error handling macro +[here](https://github.com/pytorch/pytorch/blob/72e4aab74b927c1ba5c3963cb17b4c0dce6e56bf/tools/autograd/templates/python_variable_methods.cpp#L76). + + +#### `torch::PyTorchError` + +There's also an extra error class in `CATCH_CORE_ERRORS`, +`torch::PyTorchError`. I'm not sure yet why it exists and how it differs from +`c10::Error`. `torch::PyTorchError` has several overloads: + +* `torch::IndexError` +* `torch::TypeError` +* `torch::ValueError` +* `torch::NotImplementedError` +* `torch::AttributeError` +* `torch::LinAlgError` + + +### C++ to Python Warning Translation + +The conversion of warnings from C++ to Python is described [here](https://github.com/pytorch/pytorch/blob/72e4aab74b927c1ba5c3963cb17b4c0dce6e56bf/torch/csrc/Exceptions.h#L25-L48) + + +## Misc Notes + +[PyTorch Developer Podcast - Python exceptions](https://pytorch-dev-podcast.simplecast.com/episodes/python-exceptions) +explains how C++ errors/warnings are converted to Python. TODO: listen to it +again and take notes.