diff --git a/doc/source/whatsnew/v2.1.0.rst b/doc/source/whatsnew/v2.1.0.rst index 5cafaa5759a5b..50c04815ecfb2 100644 --- a/doc/source/whatsnew/v2.1.0.rst +++ b/doc/source/whatsnew/v2.1.0.rst @@ -14,6 +14,46 @@ including other versions of pandas. Enhancements ~~~~~~~~~~~~ +.. _whatsnew_210.enhancements.pyarrow_dependency: + +PyArrow will become a required dependency with pandas 3.0 +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +`PyArrow `_ will become a required +dependency of pandas starting with pandas 3.0. This decision was made based on +`PDEP 12 `_. + +This will enable more changes that are hugely beneficial to pandas users, including +but not limited to: + +- inferring strings as PyArrow backed strings by default enabling a significant + reduction of the memory footprint and huge performance improvements. +- inferring more complex dtypes with PyArrow by default, like ``Decimal``, ``lists``, + ``bytes``, ``structured data`` and more. +- Better interoperability with other libraries that depend on Apache Arrow. + +We are collecting feedback on this decision `here `_. + +.. _whatsnew_210.enhancements.infer_strings: + +Avoid NumPy object dtype for strings by default +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Previously, all strings were stored in columns with NumPy object dtype. +This release introduces an option ``future.infer_string`` that infers all +strings as PyArrow backed strings with dtype ``pd.ArrowDtype(pa.string())`` instead. +This option only works if PyArrow is installed. PyArrow backed strings have a +significantly reduced memory footprint and provide a big performance improvement +compared to NumPy object. + +The option can be enabled with: + +.. code-block:: python + + pd.options.future.infer_string = True + +This behavior will become the default with pandas 3.0. + .. _whatsnew_210.enhancements.reduction_extension_dtypes: DataFrame reductions preserve extension dtypes