Allow `Iterable` and `Sequence` for `index` and `columns` parameters #790

dpoznik · 2023-10-05T23:02:53Z

Describe the bug
mypy complains when index and columns parameters are passed Iterable or Sequence, but these are valid.

To Reproduce

Provide a minimal runnable pandas example that is not properly checked by the stubs.

from typing import Iterable

import pandas as pd

def construct_series(
    data: Iterable[str],
    idx: Iterable[str],
) -> pd.Series:
    s = pd.Series(data, index=idx)
    return s

Indicate which type checker you are using (mypy or pyright).
mypy
Show the error message received from that type checker while checking your example.

foo.py:12: error: No overload variant of "Series" matches argument types "Iterable[str]", "Iterable[str]"  [call-overload]
foo.py:12: note: Possible overload variants:
foo.py:12: note:     def [S1] Series(data: DatetimeIndex | Sequence[Timestamp | datetime64 | datetime], index: Index | Series[Any] | ndarray[Any, Any] | list[Any] | dict[Any, Any] | range | tuple[Any, ...] | None = ..., dtype: Literal['datetime64[Y]', 'datetime64[M]', 'datetime64[W]', 'datetime64[D]', 'datetime64[h]', 'datetime64[m]', 'datetime64[s]', 'datetime64[ms]', 'datetime64[us]', 'datetime64[μs]', 'datetime64[ns]', 'datetime64[ps]', 'datetime64[fs]', 'datetime64[as]'] = ..., name: Hashable | None = ..., copy: bool = ...) -> TimestampSeries
foo.py:12: note:     def [S1] Series(data: ExtensionArray | ndarray[Any, Any] | dict[str, ndarray[Any, Any]] | list[Any] | tuple[Any, ...] | Index, index: Index | Series[Any] | ndarray[Any, Any] | list[Any] | dict[Any, Any] | range | tuple[Any, ...] | None = ..., *, dtype: Literal['datetime64[Y]', 'datetime64[M]', 'datetime64[W]', 'datetime64[D]', 'datetime64[h]', 'datetime64[m]', 'datetime64[s]', 'datetime64[ms]', 'datetime64[us]', 'datetime64[μs]', 'datetime64[ns]', 'datetime64[ps]', 'datetime64[fs]', 'datetime64[as]'], name: Hashable | None = ..., copy: bool = ...) -> TimestampSeries
foo.py:12: note:     def [S1] Series(data: PeriodIndex, index: Index | Series[Any] | ndarray[Any, Any] | list[Any] | dict[Any, Any] | range | tuple[Any, ...] | None = ..., dtype: PeriodDtype = ..., name: Hashable | None = ..., copy: bool = ...) -> PeriodSeries
foo.py:12: note:     def [S1] Series(data: TimedeltaIndex | Sequence[Timedelta | timedelta64 | timedelta], index: Index | Series[Any] | ndarray[Any, Any] | list[Any] | dict[Any, Any] | range | tuple[Any, ...] | None = ..., dtype: Literal['timedelta64[Y]', 'timedelta64[M]', 'timedelta64[W]', 'timedelta64[D]', 'timedelta64[h]', 'timedelta64[m]', 'timedelta64[s]', 'timedelta64[ms]', 'timedelta64[us]', 'timedelta64[μs]', 'timedelta64[ns]', 'timedelta64[ps]', 'timedelta64[fs]', 'timedelta64[as]'] = ..., name: Hashable | None = ..., copy: bool = ...) -> TimedeltaSeries
foo.py:12: note:     def [S1, _OrderableT] Series(data: IntervalIndex[Interval[_OrderableT]] | Interval[_OrderableT] | Sequence[Interval[_OrderableT]], index: Index | Series[Any] | ndarray[Any, Any] | list[Any] | dict[Any, Any] | range | tuple[Any, ...] | None = ..., dtype: Literal['Interval'] = ..., name: Hashable | None = ..., copy: bool = ...) -> IntervalSeries[_OrderableT]
foo.py:12: note:     def [S1] Series(data: object, dtype: type[S1], index: Index | Series[Any] | ndarray[Any, Any] | list[Any] | dict[Any, Any] | range | tuple[Any, ...] | None = ..., name: Hashable | None = ..., copy: bool = ...) -> Series[S1]
foo.py:12: note:     def [S1] Series(data: Series[S1] | dict[int, S1] | dict[str, S1] = ..., index: Index | Series[Any] | ndarray[Any, Any] | list[Any] | dict[Any, Any] | range | tuple[Any, ...] | None = ..., dtype: ExtensionDtype | str | dtype[generic] | type[str] | type[complex] | type[bool] | type[object] = ..., name: Hashable | None = ..., copy: bool = ...) -> Series[S1]
foo.py:12: note:     def [S1] Series(data: object = ..., index: Index | Series[Any] | ndarray[Any, Any] | list[Any] | dict[Any, Any] | range | tuple[Any, ...] | None = ..., dtype: ExtensionDtype | str | dtype[generic] | type[str] | type[complex] | type[bool] | type[object] = ..., name: Hashable | None = ..., copy: bool = ...) -> Series[Any]
Found 1 error in 1 file (checked 1 source file)

Please complete the following information:

OS: macOS
OS Version: Ventura 13.6
Python version: 3.11.3
version of type checker: mypy 1.5.1
version of installed pandas-stubs: 2.1.1.230928

Additional context
Would it be appropriate to update index: Axes | None (e.g., here) to something like index: Axes | Iterable | Sequence | None?

The text was updated successfully, but these errors were encountered:

Dr-Irv · 2023-10-06T13:31:00Z

Would it be appropriate to update index: Axes | None (e.g., here) to something like index: Axes | Iterable | Sequence | None?

No. Because we don't want to accept plain strings or sets as arguments, both of which would match your use of Iterable[str] .

So I think the type of your function is too wide - you are allowing arguments that are not accepted by pandas.

Side note - this is fixed for pd.DataFrame() with respect to sets, but your report uncovered that we didn't fix all the use cases, so I created a pandas issue here: pandas-dev/pandas#55425

I'm going to close this because I believe the behavior of the stubs is correct here, but if you can provide an example that works with pandas with direct arguments to pandas functions, I'm willing to reopen it.

dpoznik · 2023-10-08T03:13:13Z

we don't want to accept plain strings or sets

Ah, of course. I completely agree that Iterable and Sequence are too broad, but I'm curious for your thoughts on KeysView and ValuesView? For example, the following runs fine:

import pandas as pd

def foo(data: list[int], d: dict[str, int]) -> pd.Series:
    s = pd.Series(data, index=d.keys())
    return s

if __name__ == '__main__':
    data =[1, 2, 3]
    d = {str(value): value for value in data} 
    print(foo(data, d))

But mypy doesn't like it:

foo.py:5: error: No overload variant of "Series" matches argument types "list[int]", "dict_keys[str, int]"  [call-overload]

Side note - this is fixed for pd.DataFrame() with respect to sets, but your report uncovered that we didn't fix all the use cases, so I created a pandas issue here: pandas-dev/pandas#55425

Ah, great. I'm glad my initial report was at least indirectly helpful :)

Dr-Irv · 2023-10-09T14:46:42Z

but I'm curious for your thoughts on KeysView and ValuesView?

While they work, the docs say "arraylike" and KeysView and ValuesView are not "arraylike".

dpoznik · 2023-10-09T16:57:27Z

While they work, the docs say "arraylike" and KeysView and ValuesView are not "arraylike".

Got it. Thanks!

dpoznik · 2023-10-09T19:07:41Z

Since the pandas definition of ArrayLike is private, I'm still not sure how best to annotate a function whose argument will be passed to the index parameter. I tried numpy.typing.ArrayLike:

import numpy.typing as npt
import pandas as pd

def foo(data: list[int], idx: npt.ArrayLike) -> pd.Series:
    s = pd.Series(data, index=idx)
    return s

print(foo([1, 2, 3], ["a", "b", "c"]))

This runs, but mypy gives:

foo.py:5: error: No overload variant of "Series" matches argument types "list[int]", "_SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]"  [call-overload]

I could, of course, define a project-specific ArrayLike. Something like the following covers the use cases in my project and keeps mypy happy:

from typing import TypeAlias

import numpy as np
import numpy.typing as npt
import pandas as pd

ArrayLike: TypeAlias = list[str] | tuple[str, ...] | np.ndarray | pd.Index | pd.Series

def foo(
    data: list[int],
    idx: ArrayLike,
) -> pd.Series:
    s = pd.Series(data, index=idx)
    return s

print(foo([1, 2, 3], ["a", "b", "c"]))

Is that what you'd advise? Thanks again!

Dr-Irv · 2023-10-09T19:23:34Z

Is that what you'd advise? Thanks again!

The following would work:

import pandas as pd
from pandas._typing import Axes

def foo(
    data: list[int],
    idx: Axes,
) -> pd.Series:
    s = pd.Series(data, index=idx)
    return s


print(foo([1, 2, 3], ["a", "b", "c"]))

The Axes type will cover the acceptable types. While it's not documented, it's unlikely to change.

dpoznik · 2023-10-09T20:34:56Z

OK, great. Thanks again for all your help.

Dr-Irv closed this as completed Oct 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow `Iterable` and `Sequence` for `index` and `columns` parameters #790

Allow `Iterable` and `Sequence` for `index` and `columns` parameters #790

dpoznik commented Oct 5, 2023

Dr-Irv commented Oct 6, 2023

dpoznik commented Oct 8, 2023

Dr-Irv commented Oct 9, 2023

dpoznik commented Oct 9, 2023

dpoznik commented Oct 9, 2023

Dr-Irv commented Oct 9, 2023

dpoznik commented Oct 9, 2023

Allow Iterable and Sequence for index and columns parameters #790

Allow Iterable and Sequence for index and columns parameters #790

Comments

dpoznik commented Oct 5, 2023

Dr-Irv commented Oct 6, 2023

dpoznik commented Oct 8, 2023

Dr-Irv commented Oct 9, 2023

dpoznik commented Oct 9, 2023

dpoznik commented Oct 9, 2023

Dr-Irv commented Oct 9, 2023

dpoznik commented Oct 9, 2023

Allow `Iterable` and `Sequence` for `index` and `columns` parameters #790

Allow `Iterable` and `Sequence` for `index` and `columns` parameters #790