Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[READY] perf improvements for strftime #51298

Open
wants to merge 154 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
154 commits
Select commit Hold shift + click to select a range
a9ba5de
`period_format` now has a faster default formatter leveraging python …
Feb 6, 2023
a6c06c9
class `_Period`: new method `fast_strftime`
Feb 6, 2023
34113d4
class `Timestamp`: new method `fast_strftime`
Feb 6, 2023
5dd7ab4
New module in tslibs: `strftime.py`. New function in this module: `co…
Feb 6, 2023
8a7c039
`format_array_from_datetime`: new boolean argument `fast_strftime` to…
Feb 6, 2023
f2d2fb1
datetimelike `strftime`: new boolean argument `fast_strftime` to use …
Feb 6, 2023
fac90d7
`DatetimeIndexOpsMixin.format` and `_format_with_header`: new boolean…
Feb 6, 2023
2fc70a6
`NDFrame.to_csv` and `DataFrameRenderer.to_csv` and `CSVFormatter.__i…
Feb 6, 2023
6edda53
Added tests for the `to_csv` dataframe method to cover the new fast_s…
Feb 7, 2023
b4c815d
`TestCategoricalRepr`: added a test for dates without time, with time…
Feb 7, 2023
eaa1dc9
Fixed `test_nat` and `test_api` with the new symbols added
Feb 7, 2023
3254b54
New `test_strftime` module to cover the `strftime.py` module in tslib.
Feb 7, 2023
0f69286
`convert_strftime_format` argument `target` is now mandatory to avoid…
Feb 7, 2023
72fe379
`convert_strftime_format`: Completed unsupported directives for datet…
Feb 7, 2023
eda4243
Fixed bug in tslib `format_array_from_datetime`
Feb 7, 2023
442732f
Fixed issue in `format_array_from_datetime` when tz was not None
Feb 7, 2023
5ae3707
Added 2 todos
Feb 7, 2023
62aca61
`test_format`: Added various tests for the new feature
Feb 7, 2023
86cc1c8
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
Feb 7, 2023
6412911
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
Feb 8, 2023
2ec16ba
Fixed Datetime64TZFormatter issue due to arg renaming in recent commits
Feb 8, 2023
c325431
Added 2 asv benchs for strftime with iso8601 format, and a variant fo…
Feb 8, 2023
6c1188a
blackened, flake8, and removed asv main
Feb 8, 2023
1fad7b6
Minor improvement
Feb 10, 2023
e33ad65
Added new ASVs for strftime
Feb 10, 2023
6456694
Added asvs for period
Feb 10, 2023
91ad194
Added asvs for datetime and period indexes .format
Feb 14, 2023
80ebc82
`convert_strftime_format` is now part of the public API as it is requ…
Feb 14, 2023
087a949
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
Feb 16, 2023
bf170ba
Updated whats new 1.5.4
Feb 16, 2023
1f3db30
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
May 9, 2023
4b34639
Fixed issues following the merge from latest main. Introduced new fun…
May 9, 2023
6e189ad
Improved ASV bench slightly (added datetime index formatting tests)
May 9, 2023
9152dbc
Merged whatsnew
May 9, 2023
35a39a0
pre-commit and docstring checks
May 9, 2023
d13895b
blackened
May 10, 2023
35e4c34
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
May 10, 2023
4007330
Added two asvs
May 10, 2023
3d8d469
Fixed mypy error
May 10, 2023
1fc3d48
Fixed ASV bench
May 10, 2023
6928941
Improved ASV benchs
May 10, 2023
7c87a2f
Fixed RST format in whatsnew
May 11, 2023
21b7c9d
Hopefully fixed the ASV bench for the case when the format is the def…
May 11, 2023
fdb7309
Fixed issue in period.pyx
May 11, 2023
636f27f
Made hooks happy
May 11, 2023
1773a0d
Merge branch 'main' into feature/44764_perf_issue_new
smarie May 11, 2023
3b96d0b
Fixed variables used before initialization
May 11, 2023
3433c16
pre-commit hook upgrade
May 12, 2023
234685c
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
May 12, 2023
227f188
Fixed error in period.pyx
May 12, 2023
9c03fe4
Fixed issue: there was no need to encode to bytes to apply string for…
May 14, 2023
d44a4c9
Fixed issue: there was no need to encode to bytes to apply string for…
May 14, 2023
9a94960
Merge remote-tracking branch 'origin/feature/44764_perf_issue_new' in…
May 14, 2023
f9565ff
Revert "Fixed issue: there was no need to encode to bytes to apply st…
May 14, 2023
3a87c64
Fixed issue: there was no need to encode to bytes to apply string for…
May 14, 2023
bc5cb66
Removed todo in timedeltas.pyx
May 14, 2023
812aa2a
Fixed asv bench
May 14, 2023
69a9b67
Fixed ASV
May 14, 2023
abaf0cb
Fixed indentation issue
May 14, 2023
7c32a44
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
May 14, 2023
537b9f7
Doc fix whatsnew entries sorted
May 14, 2023
f7307f4
Improved speed of csv formatting of datetimeindex
May 15, 2023
f9e3b5f
pre-commit fixes
May 15, 2023
9cf6d79
Removed invalid comment
smarie May 16, 2023
f42fe43
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
May 31, 2023
9e51d82
Fixed typo
May 31, 2023
c2e150b
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
Jun 1, 2023
8339062
Fixed whatsnew
Jun 1, 2023
72df87c
Fixed changelog indentation error: now a one-liner
Jun 5, 2023
0851d55
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
Jun 5, 2023
ebb22e8
Fixed mypy error
Jun 5, 2023
a5869c4
Fixed mypy error 2
Jun 5, 2023
14c9cfe
Fixed whatsnew
Jun 5, 2023
e1ed22c
Fixed whatsnew ?
Jun 6, 2023
6a502fc
Fixed whatsnew ?
Jun 6, 2023
d340635
Merge branch 'main' into feature/44764_perf_issue_new
smarie Jun 6, 2023
14b8c37
Merge branch 'main' into feature/44764_perf_issue_new
smarie Jun 13, 2023
cb5bfd9
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
Nov 11, 2023
022c185
Moved whatsnew items
Nov 11, 2023
21b0a88
Re applied mods to `TestPeriodIndexFormat`, since it had moved somewh…
Nov 11, 2023
ec8036c
Re applied mods to `TestDatetimeIndexFormat`, since it had moved some…
Nov 11, 2023
85c65a6
Re applied mods from `TestFastStrfTimeScalars` in the right places
Nov 11, 2023
6c35bad
Merge remote-tracking branch 'origin/feature/44764_perf_issue_new' in…
Nov 11, 2023
bac97f9
Ruff: Fixed invalid character in comment
Nov 16, 2023
324052b
black+isort+fixed tests
Nov 16, 2023
20e38ef
Implemented conservative fallback as suggested per code review
Nov 16, 2023
9602eb5
black
Nov 16, 2023
9223451
Implemented hypothesis tests as suggested per code review
Nov 16, 2023
5ffc88d
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
Nov 16, 2023
fb0e30c
Removed the `fast_strftime` argument everywhere as suggested per code…
Nov 16, 2023
4ef8ec1
Fixed docstring
Nov 18, 2023
3c50a76
Trying to have meson understand that there is a py file in the tslib
Nov 18, 2023
3e4cd1e
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
Nov 18, 2023
e41df33
Fixed ruff error
Nov 18, 2023
6e0e092
Fixed tests
Nov 18, 2023
83d5539
Fixed test
Nov 18, 2023
cd4d4f0
Fixed test
Nov 18, 2023
4198ea6
Fixed test
Nov 18, 2023
8e2ef29
Fixed test for musl linux
Nov 18, 2023
d0c3845
Fixed mypy errors
Nov 18, 2023
a3fcd6c
Fixed mypy error
Nov 18, 2023
0390344
Fixed mypy error
Nov 19, 2023
f43e511
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
Nov 19, 2023
18fa4d5
Merge branch 'main' into feature/44764_perf_issue_new
smarie Dec 15, 2023
4521a1e
Merge branch 'main' into feature/44764_perf_issue_new
smarie Dec 20, 2023
cc1a4d2
Merge branch 'main' into feature/44764_perf_issue_new
smarie Dec 31, 2023
9cff856
Merge branch 'main' into feature/44764_perf_issue_new
smarie Jan 9, 2024
42e87c6
Removed `convert_strftime_format` from top-level API and moved it to …
Jan 13, 2024
404ab84
Merge remote-tracking branch 'origin/feature/44764_perf_issue_new' in…
Jan 13, 2024
1726096
Merge branch 'main' into feature/44764_perf_issue_new
smarie Jan 13, 2024
3dd707c
isort
Jan 13, 2024
ea28669
Merge remote-tracking branch 'origin/feature/44764_perf_issue_new' in…
Jan 13, 2024
a94aec0
Merge branch 'main' into feature/44764_perf_issue_new
smarie Jan 13, 2024
aa4b0f6
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
Mar 3, 2024
a382325
As per code review: instance level `fast_strftime` are now private `_…
Mar 3, 2024
d2ef85e
Added maintenance comment
Mar 3, 2024
b0471a7
black+isort
Mar 3, 2024
bbbf721
Moved changelog to 3.0.0 and improved it slightly
Mar 3, 2024
5ea765c
Merge branch 'feature/44764_perf_issue_new' of https://github.com/sma…
Mar 3, 2024
691b127
Fixed test: fast_strftime not part of api anymore (private)
Mar 3, 2024
c41e8c9
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
Mar 8, 2024
e0ddfcb
formatting
Mar 8, 2024
b09c641
Fixed ASV benchmark: format method disappeared from DateTimeIndex
Mar 8, 2024
a519427
Improved what's new
Mar 8, 2024
c58d5eb
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
Mar 8, 2024
098e53e
Fixed ASV error
Mar 8, 2024
9ad010c
Merge branch 'main' into feature/44764_perf_issue_new
smarie Mar 25, 2024
e719599
Code review: renamed `loc_s` into `locale_dt_strings`
Apr 2, 2024
ffc661d
Code review: moved whats new entry to perf
Apr 2, 2024
b6442d3
Code review: removed commented out code
Apr 2, 2024
40a5c48
Code review: added support for negative and small years and added cor…
Apr 2, 2024
b0f73f5
Merge remote-tracking branch 'origin/feature/44764_perf_issue_new' in…
Apr 2, 2024
084f124
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
Apr 2, 2024
b4983f8
Improved test for small years
Apr 2, 2024
159f1bd
Fixed failing test
Apr 3, 2024
76fb52e
Fixed failing test
Apr 3, 2024
c88e52d
Merge branch 'main' into feature/44764_perf_issue_new
smarie Apr 3, 2024
103d9cf
Fixed test on linux
Apr 3, 2024
478ea4e
Trying to debug datetime.strftime on linux
Apr 3, 2024
51431b3
Fixed tests on linux and windows
Apr 3, 2024
847a9f3
Removed useless comment
Apr 3, 2024
338404c
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
Apr 3, 2024
a21837c
Fixed tests on linux musl
Apr 4, 2024
fc03926
Attempt to fix on MUSL
Apr 4, 2024
03dfc52
Fixed test for linux musl
Apr 4, 2024
2b98619
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
Jun 29, 2024
514e785
Simplified `convert_strftime_format` as per code review
Jun 30, 2024
d130c22
Renamed `fast_strftime` with `strftime_pystr` as per code review
Jun 30, 2024
d6d6905
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
Jun 30, 2024
66b5bbe
Fixed test on WASM/Pyodide
Jun 30, 2024
25ad4fb
Renamed internal argument ``strftime_pystr`` into ``_use_pystr_engine…
Jul 1, 2024
5001569
Fixed test on pyodide
Jul 1, 2024
48a0096
Fixed cython-lint error
Jul 1, 2024
3e86c1f
Merge branch 'main' into feature/44764_perf_issue_new
smarie Jul 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 21 additions & 9 deletions asv_bench/benchmarks/io/csv.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,13 @@ def setup(self):
rng = date_range("1/1/2000", periods=1000)
self.data = DataFrame(rng, index=rng)

def time_frame_date_formatting(self):
def time_frame_date_formatting_default(self):
self.data.to_csv(self.fname)

def time_frame_date_formatting_default_explicit(self):
self.data.to_csv(self.fname, date_format="%Y-%m-%d")

def time_frame_date_formatting_custom(self):
self.data.to_csv(self.fname, date_format="%Y%m%d")


Expand All @@ -90,11 +96,14 @@ def setup(self):
rng = date_range("2000", periods=100_000, freq="s")
self.data = DataFrame({"a": 1}, index=rng)

def time_frame_date_formatting_index(self):
def time_frame_date_formatting_index_default(self):
self.data.to_csv(self.fname)

def time_frame_date_formatting_index_default_explicit(self):
self.data.to_csv(self.fname, date_format="%Y-%m-%d %H:%M:%S")

def time_frame_date_no_format_index(self):
self.data.to_csv(self.fname)
def time_frame_date_formatting_index_custom(self):
self.data.to_csv(self.fname, date_format="%Y-%m-%d__%H:%M:%S")


class ToCSVPeriod(BaseIO):
Expand All @@ -117,7 +126,7 @@ def time_frame_period_formatting_default(self, nobs, freq):
def time_frame_period_formatting_default_explicit(self, nobs, freq):
self.data.to_csv(self.fname, date_format=self.default_fmt)

def time_frame_period_formatting(self, nobs, freq):
def time_frame_period_formatting_custom(self, nobs, freq):
# Nb: `date_format` is not actually taken into account here today, so the
# performance is currently identical to `time_frame_period_formatting_default`
# above. This timer is therefore expected to degrade when GH#51621 is fixed.
Expand All @@ -139,15 +148,15 @@ def setup(self, nobs, freq):
elif freq == "h":
self.default_fmt = "%Y-%m-%d %H:00"

def time_frame_period_formatting_index(self, nobs, freq):
self.data.to_csv(self.fname, date_format="%Y-%m-%d___%H:%M:%S")

def time_frame_period_formatting_index_default(self, nobs, freq):
self.data.to_csv(self.fname)

def time_frame_period_formatting_index_default_explicit(self, nobs, freq):
self.data.to_csv(self.fname, date_format=self.default_fmt)

def time_frame_period_formatting_index_custom(self, nobs, freq):
self.data.to_csv(self.fname, date_format="%Y-%m-%d___%H:%M:%S")


class ToCSVDatetimeBig(BaseIO):
fname = "__test__.csv"
Expand All @@ -166,9 +175,12 @@ def setup(self, nobs):
}
)

def time_frame(self, nobs):
def time_frame_formatting_default(self, nobs):
self.data.to_csv(self.fname)

def time_frame_date_formatting_custom(self, nobs):
self.data.to_csv(self.fname, date_format="%Y%m%d__%H%M%S")


class ToCSVIndexes(BaseIO):
fname = "__test__.csv"
Expand Down
65 changes: 51 additions & 14 deletions asv_bench/benchmarks/strftime.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@

class DatetimeStrftime:
timeout = 1500
params = [1000, 10000]
param_names = ["nobs"]
params = ([1000, 10000], [False, True])
param_names = ["nobs", "tz_aware"]

def setup(self, nobs):
def setup(self, nobs, tz_aware):
d = "2018-11-29"
dt = "2018-11-26 11:18:27.0"
self.data = pd.DataFrame(
Expand All @@ -19,37 +19,68 @@ def setup(self, nobs):
"r": [np.random.uniform()] * nobs,
}
)
if tz_aware:
self.data["dt"] = self.data["dt"].dt.tz_localize("UTC")
self.data["d"] = self.data["d"].dt.tz_localize("UTC")

self.data["i"] = self.data["dt"]
self.data.set_index("i", inplace=True)
Comment on lines +22 to +27

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter tz_aware is being used to toggle between timezone-aware and naive timestamps. However, the logic for handling timezone-aware data in the setup method is somewhat repetitive and could benefit from encapsulation in a utility function.

Extract the logic for timezone localization into a helper function to improve readability and maintainability.
For example:
`
def localize_if_required(dataframe, tz_aware):
if tz_aware:
dataframe["dt"] = dataframe["dt"].dt.tz_localize("UTC")
dataframe["d"] = dataframe["d"].dt.tz_localize("UTC")

`


def time_frame_date_to_str(self, nobs):
def time_frame_date_to_str(self, nobs, tz_aware):
self.data["d"].astype(str)

def time_frame_date_formatting_default(self, nobs):
def time_frame_date_formatting_default(self, nobs, tz_aware):
self.data["d"].dt.strftime(date_format=None)

def time_frame_date_formatting_default_explicit(self, nobs):
self.data["d"].dt.strftime(date_format="%Y-%m-%d")
def time_frame_date_formatting_index_to_str(self, nobs, tz_aware):
self.data.index.astype(str)

def time_frame_date_formatting_index_default(self, nobs, tz_aware):
self.data.index.strftime(date_format=None)

def time_frame_date_formatting_custom(self, nobs):
def time_frame_date_formatting_custom(self, nobs, tz_aware):
self.data["d"].dt.strftime(date_format="%Y---%m---%d")

def time_frame_datetime_to_str(self, nobs):
def time_frame_date_formatting_index_custom(self, nobs, tz_aware):
self.data.index.strftime(date_format="%Y---%m---%d")

def time_frame_datetime_to_str(self, nobs, tz_aware):
self.data["dt"].astype(str)

def time_frame_datetime_formatting_default(self, nobs):
def time_frame_datetime_formatting_default(self, nobs, tz_aware):
self.data["dt"].dt.strftime(date_format=None)

def time_frame_datetime_formatting_default_explicit_date_only(self, nobs):
def time_frame_datetime_formatting_default_explicit_date_only(self, nobs, tz_aware):
self.data["dt"].dt.strftime(date_format="%Y-%m-%d")

def time_frame_datetime_formatting_default_explicit(self, nobs):
def time_frame_datetime_formatting_default_explicit(self, nobs, tz_aware):
self.data["dt"].dt.strftime(date_format="%Y-%m-%d %H:%M:%S")

def time_frame_datetime_formatting_default_with_float(self, nobs):
def time_frame_datetime_formatting_default_with_float(self, nobs, tz_aware):
self.data["dt"].dt.strftime(date_format="%Y-%m-%d %H:%M:%S.%f")

def time_frame_datetime_formatting_custom(self, nobs):
def time_frame_datetime_formatting_index_to_str(self, nobs, tz_aware):
self.data.set_index("dt").index.astype(str)

def time_frame_datetime_formatting_index_default(self, nobs, tz_aware):
self.data.set_index("dt").index.strftime(date_format=None)

def time_frame_datetime_formatting_custom(self, nobs, tz_aware):
self.data["dt"].dt.strftime(date_format="%Y-%m-%d --- %H:%M:%S")

def time_frame_datetime_formatting_index_custom(self, nobs, tz_aware):
self.data.set_index("dt").index.strftime(date_format="%Y-%m-%d --- %H:%M:%S")

def time_frame_datetime_formatting_iso8601_map(self, nobs, tz_aware):
self.data["dt"].map(lambda timestamp: timestamp.isoformat())

def time_frame_datetime_formatting_iso8601_strftime_Z(self, nobs, tz_aware):
self.data["dt"].dt.strftime(date_format="%Y-%m-%dT%H:%M:%SZ")

def time_frame_datetime_formatting_iso8601_strftime_offset(self, nobs, tz_aware):
"""Not optimized yet as %z is not supported by `convert_strftime_format`"""
self.data["dt"].dt.strftime(date_format="%Y-%m-%dT%H:%M:%S%z")


class PeriodStrftime:
timeout = 1500
Expand All @@ -73,6 +104,12 @@ def setup(self, nobs, freq):
def time_frame_period_to_str(self, nobs, freq):
self.data["p"].astype(str)

def time_frame_period_str(self, nobs, freq):
self.data["p"].apply(str)

def time_frame_period_repr(self, nobs, freq):
self.data["p"].apply(repr)

def time_frame_period_formatting_default(self, nobs, freq):
self.data["p"].dt.strftime(date_format=None)

Expand Down
7 changes: 7 additions & 0 deletions doc/source/whatsnew/v3.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -441,6 +441,13 @@ Other Removals

Performance improvements
~~~~~~~~~~~~~~~~~~~~~~~~

- Performance improvement in all datetime formatting procedures, achieved with using python string formatting instead of OS ``strftime`` (:issue:`51298`)::

- in :meth:`DatetimeLikeArrayMixin.strftime`. Classes :class:`DatetimeArray`, :class:`PeriodArray`, :class:`DatetimeIndex`, :class:`PeriodIndex` benefit from the improvement. :class:`TimedeltaArray.strftime` and :class:`TimedeltaArray.format` are not impacted as their ``date_format`` argument is currently ignored.
- in :meth:`NDFrame.to_csv`, :meth:`DataFrameRenderer.to_csv` and :class:`CSVFormatter`
- This is achieved thanks to new :func:`pd.tseries.api.convert_strftime_format` to convert a strftime formatting template into a python string formatting template. strftime templates that can not be converted to such a fast python string template continue to be processed with OS ``strftime`` as fallback.

- Eliminated circular reference in to original pandas object in accessor attributes (e.g. :attr:`Series.str`). However, accessor instantiation is no longer cached (:issue:`47667`, :issue:`41357`)
- :attr:`Categorical.categories` returns a :class:`RangeIndex` columns instead of an :class:`Index` if the constructed ``values`` was a ``range``. (:issue:`57787`)
- :class:`DataFrame` returns a :class:`RangeIndex` columns when possible when ``data`` is a ``dict`` (:issue:`57943`)
Expand Down
2 changes: 2 additions & 0 deletions pandas/_libs/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
"Period",
"Timedelta",
"Timestamp",
"convert_strftime_format",
"iNaT",
"Interval",
]
Expand All @@ -23,5 +24,6 @@
Period,
Timedelta,
Timestamp,
convert_strftime_format,
iNaT,
)
1 change: 1 addition & 0 deletions pandas/_libs/tslib.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ def format_array_from_datetime(
format: str | None = ...,
na_rep: str | float = ...,
reso: int = ..., # NPY_DATETIMEUNIT
_use_pystr_engine: bool = ...,
) -> npt.NDArray[np.object_]: ...
def first_non_null(values: np.ndarray) -> int: ...
def array_to_datetime(
Expand Down
56 changes: 55 additions & 1 deletion pandas/_libs/tslib.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,10 @@ from pandas._libs.tslibs import (
Resolution,
get_resolution,
)
from pandas._libs.tslibs.strftime import (
UnsupportedStrFmtDirective,
convert_strftime_format,
)
from pandas._libs.tslibs.timestamps import Timestamp

# Note: this is the only non-tslibs intra-pandas dependency here
Expand Down Expand Up @@ -118,6 +122,7 @@ def format_array_from_datetime(
str format=None,
na_rep: str | float = "NaT",
NPY_DATETIMEUNIT reso=NPY_FR_ns,
_use_pystr_engine=True,
) -> np.ndarray:
"""
return a np object array of the string formatted values
Expand All @@ -131,18 +136,22 @@ def format_array_from_datetime(
na_rep : optional, default is None
a nat format
reso : NPY_DATETIMEUNIT, default NPY_FR_ns
_use_pystr_engine : bool, default True
If `True` (default) and the format permits it, a faster formatting
method will be used. See `convert_strftime_format`.

Returns
-------
np.ndarray[object]
"""
cdef:
int64_t val, ns, N = values.size
int64_t val, ns, y, h, N = values.size
bint show_ms = False, show_us = False, show_ns = False
bint basic_format = False, basic_format_day = False
_Timestamp ts
object res
npy_datetimestruct dts
object str_format, locale_dt_strings

# Note that `result` (and thus `result_flat`) is C-order and
# `it` iterates C-order as well, so the iteration matches
Expand Down Expand Up @@ -176,8 +185,24 @@ def format_array_from_datetime(
# Default format for dates
basic_format_day = True

# Sanity check - these flags are exclusive
assert not (basic_format_day and basic_format)

if not basic_format_day and not basic_format and _use_pystr_engine:
# Preprocessing for _use_pystr_engine
if format is None:
# We'll fallback to the Timestamp.str method
_use_pystr_engine = False
else:
try:
# Try to get the string formatting template for this format
str_format, locale_dt_strings = convert_strftime_format(
format, target="datetime"
)
except UnsupportedStrFmtDirective:
# Unsupported directive: fallback to standard `strftime`
_use_pystr_engine = False

for i in range(N):
# Analogous to: utc_val = values[i]
val = (<int64_t*>cnp.PyArray_ITER_DATA(it))[0]
Expand All @@ -203,6 +228,35 @@ def format_array_from_datetime(
elif show_ms:
res += f".{dts.us // 1000:03d}"

elif _use_pystr_engine:

if tz is None:
pandas_datetime_to_datetimestruct(val, reso, &dts)

# Use string formatting for faster strftime
y = dts.year
shortyear = y % 100
if y < 0 and shortyear != 0:
# Fix negative modulo to adopt C-style modulo
shortyear -= 100
h = dts.hour
res = str_format % {
"year": y,
"shortyear": shortyear,
"month": dts.month,
"day": dts.day,
"hour": h,
"hour12": 12 if h in (0, 12) else (h % 12),
"ampm": locale_dt_strings.pm if (h // 12) else locale_dt_strings.am,
"min": dts.min,
"sec": dts.sec,
"us": dts.us,
}
else:
ts = Timestamp._from_value_and_reso(val, reso=reso, tz=tz)

# Use string formatting for faster strftime
res = ts._strftime_pystr(str_format, locale_dt_strings)
else:

ts = Timestamp._from_value_and_reso(val, reso=reso, tz=tz)
Expand Down
6 changes: 6 additions & 0 deletions pandas/_libs/tslibs/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@
"OutOfBoundsTimedelta",
"IncompatibleFrequency",
"Period",
"convert_strftime_format",
"UnsupportedStrFmtDirective",
"Resolution",
"Timedelta",
"normalize_i8_timestamps",
Expand Down Expand Up @@ -69,6 +71,10 @@
IncompatibleFrequency,
Period,
)
from pandas._libs.tslibs.strftime import (
UnsupportedStrFmtDirective,
convert_strftime_format,
)
from pandas._libs.tslibs.timedeltas import (
Timedelta,
delta_to_nanoseconds,
Expand Down
1 change: 1 addition & 0 deletions pandas/_libs/tslibs/meson.build
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ sources_to_install = [
'offsets.pyi',
'parsing.pyi',
'period.pyi',
'strftime.py',
smarie marked this conversation as resolved.
Show resolved Hide resolved
'strptime.pyi',
'timedeltas.pyi',
'timestamps.pyi',
Expand Down
1 change: 1 addition & 0 deletions pandas/_libs/tslibs/period.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ def period_array_strftime(
dtype_code: int,
na_rep,
date_format: str | None,
_use_pystr_engine: bool,
) -> npt.NDArray[np.object_]: ...

# exposed for tests
Expand Down
Loading
Loading