From 224457d89ef127b62b85c3e4ac4e821fdcf65bce Mon Sep 17 00:00:00 2001 From: Patrick Hoefler <61934744+phofl@users.noreply.github.com> Date: Thu, 10 Aug 2023 00:03:52 +0200 Subject: [PATCH] Add whatsnew for arrow (#54476) * Add whatsnew for arrow * Update * Update doc/source/whatsnew/v2.1.0.rst Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> * Update doc/source/whatsnew/v2.1.0.rst Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> * Update doc/source/whatsnew/v2.1.0.rst Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> * Update doc/source/whatsnew/v2.1.0.rst Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> * Update doc/source/whatsnew/v2.1.0.rst Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> --------- Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> --- doc/source/whatsnew/v2.1.0.rst | 40 ++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/doc/source/whatsnew/v2.1.0.rst b/doc/source/whatsnew/v2.1.0.rst index 5cafaa5759a5b..50c04815ecfb2 100644 --- a/doc/source/whatsnew/v2.1.0.rst +++ b/doc/source/whatsnew/v2.1.0.rst @@ -14,6 +14,46 @@ including other versions of pandas. Enhancements ~~~~~~~~~~~~ +.. _whatsnew_210.enhancements.pyarrow_dependency: + +PyArrow will become a required dependency with pandas 3.0 +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +`PyArrow `_ will become a required +dependency of pandas starting with pandas 3.0. This decision was made based on +`PDEP 12 `_. + +This will enable more changes that are hugely beneficial to pandas users, including +but not limited to: + +- inferring strings as PyArrow backed strings by default enabling a significant + reduction of the memory footprint and huge performance improvements. +- inferring more complex dtypes with PyArrow by default, like ``Decimal``, ``lists``, + ``bytes``, ``structured data`` and more. +- Better interoperability with other libraries that depend on Apache Arrow. + +We are collecting feedback on this decision `here `_. + +.. _whatsnew_210.enhancements.infer_strings: + +Avoid NumPy object dtype for strings by default +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Previously, all strings were stored in columns with NumPy object dtype. +This release introduces an option ``future.infer_string`` that infers all +strings as PyArrow backed strings with dtype ``pd.ArrowDtype(pa.string())`` instead. +This option only works if PyArrow is installed. PyArrow backed strings have a +significantly reduced memory footprint and provide a big performance improvement +compared to NumPy object. + +The option can be enabled with: + +.. code-block:: python + + pd.options.future.infer_string = True + +This behavior will become the default with pandas 3.0. + .. _whatsnew_210.enhancements.reduction_extension_dtypes: DataFrame reductions preserve extension dtypes