Skip to content

Commit

Permalink
Merge branch 'main' of https://github.com/pandas-dev/pandas into gb_o…
Browse files Browse the repository at this point in the history
…bserved_pre
  • Loading branch information
rhshadrach committed Oct 30, 2023
2 parents 31a7c92 + 2d2d67d commit 5ecfbeb
Show file tree
Hide file tree
Showing 99 changed files with 4,608 additions and 2,557 deletions.
4 changes: 4 additions & 0 deletions .github/workflows/unit-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,10 @@ jobs:
env_file: actions-311.yaml
pattern: "not slow and not network and not single_cpu"
pandas_copy_on_write: "1"
- name: "Copy-on-Write (warnings)"
env_file: actions-311.yaml
pattern: "not slow and not network and not single_cpu"
pandas_copy_on_write: "warn"
- name: "Pypy"
env_file: actions-pypy-39.yaml
pattern: "not slow and not network and not single_cpu"
Expand Down
6 changes: 6 additions & 0 deletions doc/source/development/contributing_codebase.rst
Original file line number Diff line number Diff line change
Expand Up @@ -456,6 +456,12 @@ be located.

- tests.io

.. note::

This includes ``to_string`` but excludes ``__repr__``, which is
tested in ``tests.frame.test_repr`` and ``tests.series.test_repr``.
Other classes often have a ``test_formats`` file.

C) Otherwise
This test likely belongs in one of:

Expand Down
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v2.1.2.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.. _whatsnew_212:

What's new in 2.1.2 (October 25, 2023)
What's new in 2.1.2 (October 26, 2023)
---------------------------------------

These are the changes in pandas 2.1.2. See :ref:`release` for a full changelog
Expand Down
6 changes: 5 additions & 1 deletion doc/source/whatsnew/v2.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -323,12 +323,14 @@ Datetimelike
^^^^^^^^^^^^
- Bug in :func:`concat` raising ``AttributeError`` when concatenating all-NA DataFrame with :class:`DatetimeTZDtype` dtype DataFrame. (:issue:`52093`)
- Bug in :meth:`DatetimeIndex.union` returning object dtype for tz-aware indexes with the same timezone but different units (:issue:`55238`)
- Bug in :meth:`Index.is_monotonic_increasing` and :meth:`Index.is_monotonic_decreasing` always caching :meth:`Index.is_unique` as ``True`` when first value in index is ``NaT`` (:issue:`55755`)
- Bug in :meth:`Index.view` to a datetime64 dtype with non-supported resolution incorrectly raising (:issue:`55710`)
- Bug in :meth:`Tick.delta` with very large ticks raising ``OverflowError`` instead of ``OutOfBoundsTimedelta`` (:issue:`55503`)
- Bug in adding or subtracting a :class:`Week` offset to a ``datetime64`` :class:`Series`, :class:`Index`, or :class:`DataFrame` column with non-nanosecond resolution returning incorrect results (:issue:`55583`)
- Bug in addition or subtraction of :class:`BusinessDay` offset with ``offset`` attribute to non-nanosecond :class:`Index`, :class:`Series`, or :class:`DataFrame` column giving incorrect results (:issue:`55608`)
- Bug in addition or subtraction of :class:`DateOffset` objects with microsecond components to ``datetime64`` :class:`Index`, :class:`Series`, or :class:`DataFrame` columns with non-nanosecond resolution (:issue:`55595`)
- Bug in addition or subtraction of very large :class:`Tick` objects with :class:`Timestamp` or :class:`Timedelta` objects raising ``OverflowError`` instead of ``OutOfBoundsTimedelta`` (:issue:`55503`)
- Bug in creating a :class:`Index`, :class:`Series`, or :class:`DataFrame` with a non-nanosecond ``datetime64`` dtype and inputs that would be out of bounds for a ``datetime64[ns]`` incorrectly raising ``OutOfBoundsDatetime`` (:issue:`55756`)
-

Timedelta
Expand All @@ -349,6 +351,7 @@ Numeric

Conversion
^^^^^^^^^^
- Bug in :func:`astype` when called with ``str`` on unpickled array - the array might change in-place (:issue:`54654`)
- Bug in :meth:`Series.convert_dtypes` not converting all NA column to ``null[pyarrow]`` (:issue:`55346`)
-

Expand Down Expand Up @@ -390,6 +393,7 @@ I/O
- Bug in :func:`read_excel`, with ``engine="xlrd"`` (``xls`` files) erroring when file contains NaNs/Infs (:issue:`54564`)
- Bug in :func:`to_excel`, with ``OdsWriter`` (``ods`` files) writing boolean/string value (:issue:`54994`)
- Bug in :meth:`DataFrame.to_hdf` and :func:`read_hdf` with ``datetime64`` dtypes with non-nanosecond resolution failing to round-trip correctly (:issue:`55622`)
- Bug in :meth:`pandas.read_excel` with ``engine="odf"`` (``ods`` files) when string contains annotation (:issue:`55200`)
- Bug in :meth:`pandas.read_excel` with an ODS file without cached formatted cell for float values (:issue:`55219`)
- Bug where :meth:`DataFrame.to_json` would raise an ``OverflowError`` instead of a ``TypeError`` with unsupported NumPy types (:issue:`55403`)
-
Expand Down Expand Up @@ -441,9 +445,9 @@ Other
- Bug in :func:`cut` incorrectly allowing cutting of timezone-aware datetimes with timezone-naive bins (:issue:`54964`)
- Bug in :func:`infer_freq` and :meth:`DatetimeIndex.inferred_freq` with weekly frequencies and non-nanosecond resolutions (:issue:`55609`)
- Bug in :meth:`DataFrame.apply` where passing ``raw=True`` ignored ``args`` passed to the applied function (:issue:`55009`)
- Bug in :meth:`Dataframe.from_dict` which would always sort the rows of the created :class:`DataFrame`. (:issue:`55683`)
- Bug in rendering ``inf`` values inside a a :class:`DataFrame` with the ``use_inf_as_na`` option enabled (:issue:`55483`)
- Bug in rendering a :class:`Series` with a :class:`MultiIndex` when one of the index level's names is 0 not having that name displayed (:issue:`55415`)
-

.. ***DO NOT USE THIS SECTION***
Expand Down
14 changes: 13 additions & 1 deletion pandas/_config/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
"option_context",
"options",
"using_copy_on_write",
"warn_copy_on_write",
]
from pandas._config import config
from pandas._config import dates # pyright: ignore[reportUnusedImport] # noqa: F401
Expand All @@ -32,7 +33,18 @@

def using_copy_on_write() -> bool:
_mode_options = _global_config["mode"]
return _mode_options["copy_on_write"] and _mode_options["data_manager"] == "block"
return (
_mode_options["copy_on_write"] is True
and _mode_options["data_manager"] == "block"
)


def warn_copy_on_write() -> bool:
_mode_options = _global_config["mode"]
return (
_mode_options["copy_on_write"] == "warn"
and _mode_options["data_manager"] == "block"
)


def using_nullable_dtypes() -> bool:
Expand Down
2 changes: 1 addition & 1 deletion pandas/_libs/algos.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -814,7 +814,7 @@ def is_monotonic(ndarray[numeric_object_t, ndim=1] arr, bint timelike):
return True, True, True

if timelike and <int64_t>arr[0] == NPY_NAT:
return False, False, True
return False, False, False

if numeric_object_t is not object:
with nogil:
Expand Down
3 changes: 2 additions & 1 deletion pandas/_libs/lib.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -792,7 +792,8 @@ cpdef ndarray[object] ensure_string_array(

result = np.asarray(arr, dtype="object")

if copy and result is arr:
if copy and (result is arr or np.shares_memory(arr, result)):
# GH#54654
result = result.copy()
elif not copy and result is arr:
already_copied = False
Expand Down
1 change: 1 addition & 0 deletions pandas/_libs/tslib.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ def array_to_datetime(
dayfirst: bool = ...,
yearfirst: bool = ...,
utc: bool = ...,
creso: int = ...,
) -> tuple[np.ndarray, tzinfo | None]: ...

# returned ndarray may be object dtype or datetime64[ns]
Expand Down
27 changes: 14 additions & 13 deletions pandas/_libs/tslib.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ from pandas._libs.tslibs.conversion cimport (
get_datetime64_nanos,
parse_pydatetime,
)
from pandas._libs.tslibs.dtypes cimport npy_unit_to_abbrev
from pandas._libs.tslibs.nattype cimport (
NPY_NAT,
c_NaT as NaT,
Expand Down Expand Up @@ -277,6 +278,7 @@ def array_with_unit_to_datetime(
result, tz = array_to_datetime(
values.astype(object, copy=False),
errors=errors,
creso=NPY_FR_ns,
)
return result, tz

Expand Down Expand Up @@ -408,6 +410,7 @@ cpdef array_to_datetime(
bint dayfirst=False,
bint yearfirst=False,
bint utc=False,
NPY_DATETIMEUNIT creso=NPY_FR_ns,
):
"""
Converts a 1D array of date-like values to a numpy array of either:
Expand All @@ -434,6 +437,7 @@ cpdef array_to_datetime(
yearfirst parsing behavior when encountering datetime strings
utc : bool, default False
indicator whether the dates should be UTC
creso : NPY_DATETIMEUNIT, default NPY_FR_ns
Returns
-------
Expand All @@ -457,13 +461,14 @@ cpdef array_to_datetime(
set out_tzoffset_vals = set()
tzinfo tz_out = None
cnp.flatiter it = cnp.PyArray_IterNew(values)
NPY_DATETIMEUNIT creso = NPY_FR_ns
DatetimeParseState state = DatetimeParseState()
str reso_str

# specify error conditions
assert is_raise or is_ignore or is_coerce

result = np.empty((<object>values).shape, dtype="M8[ns]")
reso_str = npy_unit_to_abbrev(creso)
result = np.empty((<object>values).shape, dtype=f"M8[{reso_str}]")
iresult = result.view("i8").ravel()

for i in range(n):
Expand All @@ -477,14 +482,14 @@ cpdef array_to_datetime(

elif PyDateTime_Check(val):
tz_out = state.process_datetime(val, tz_out, utc_convert)
iresult[i] = parse_pydatetime(val, &dts, utc_convert, creso=creso)
iresult[i] = parse_pydatetime(val, &dts, creso=creso)

elif PyDate_Check(val):
iresult[i] = pydate_to_dt64(val, &dts)
check_dts_bounds(&dts)
iresult[i] = pydate_to_dt64(val, &dts, reso=creso)
check_dts_bounds(&dts, creso)

elif is_datetime64_object(val):
iresult[i] = get_datetime64_nanos(val, NPY_FR_ns)
iresult[i] = get_datetime64_nanos(val, creso)

elif is_integer_object(val) or is_float_object(val):
# these must be ns unit by-definition
Expand All @@ -493,23 +498,23 @@ cpdef array_to_datetime(
iresult[i] = NPY_NAT
else:
# we now need to parse this as if unit='ns'
iresult[i] = cast_from_unit(val, "ns")
iresult[i] = cast_from_unit(val, "ns", out_reso=creso)

elif isinstance(val, str):
# string
if type(val) is not str:
# GH#32264 np.str_ object
val = str(val)

if parse_today_now(val, &iresult[i], utc):
if parse_today_now(val, &iresult[i], utc, creso):
# We can't _quite_ dispatch this to convert_str_to_tsobject
# bc there isn't a nice way to pass "utc"
continue

_ts = convert_str_to_tsobject(
val, None, unit="ns", dayfirst=dayfirst, yearfirst=yearfirst
)
_ts.ensure_reso(NPY_FR_ns, val)
_ts.ensure_reso(creso, val)

iresult[i] = _ts.value

Expand All @@ -519,10 +524,6 @@ cpdef array_to_datetime(
# store the UTC offsets in seconds instead
nsecs = tz.utcoffset(None).total_seconds()
out_tzoffset_vals.add(nsecs)
# need to set seen_datetime_offset *after* the
# potentially-raising timezone(timedelta(...)) call,
# otherwise we can go down the is_same_offsets path
# bc len(out_tzoffset_vals) == 0
seen_datetime_offset = True
else:
# Add a marker for naive string, to track if we are
Expand Down
5 changes: 3 additions & 2 deletions pandas/_libs/tslibs/conversion.pxd
Original file line number Diff line number Diff line change
Expand Up @@ -43,14 +43,15 @@ cdef int64_t get_datetime64_nanos(object val, NPY_DATETIMEUNIT reso) except? -1

cpdef datetime localize_pydatetime(datetime dt, tzinfo tz)
cdef int64_t cast_from_unit(object ts, str unit, NPY_DATETIMEUNIT out_reso=*) except? -1
cpdef (int64_t, int) precision_from_unit(str unit, NPY_DATETIMEUNIT out_reso=*)
cpdef (int64_t, int) precision_from_unit(
NPY_DATETIMEUNIT in_reso, NPY_DATETIMEUNIT out_reso=*
)

cdef maybe_localize_tso(_TSObject obj, tzinfo tz, NPY_DATETIMEUNIT reso)


cdef int64_t parse_pydatetime(
datetime val,
npy_datetimestruct *dts,
bint utc_convert,
NPY_DATETIMEUNIT creso,
) except? -1
2 changes: 1 addition & 1 deletion pandas/_libs/tslibs/conversion.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,6 @@ DT64NS_DTYPE: np.dtype
TD64NS_DTYPE: np.dtype

def precision_from_unit(
unit: str,
in_reso: int, # NPY_DATETIMEUNIT
) -> tuple[int, int]: ... # (int64_t, _)
def localize_pydatetime(dt: datetime, tz: tzinfo | None) -> datetime: ...
Loading

0 comments on commit 5ecfbeb

Please sign in to comment.