BUG: read_csv not respecting object dtype when option is set #56047

phofl · 2023-11-18T13:33:28Z

closes #xxxx (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

we are not honouring object dtype here, thoughts on performance @jbrockmendel ?

phofl · 2023-11-29T20:54:38Z

Can we get this one in?

mroeschke · 2023-11-30T18:11:27Z

pandas/io/parsers/readers.py

@@ -1846,7 +1851,29 @@ def read(self, nrows: int | None = None) -> DataFrame:
            else:
                new_rows = len(index)

-            df = DataFrame(col_dict, columns=columns, index=index)
+            if hasattr(self, "orig_options"):
+                dtype_arg = self.orig_options.get("dtype", None)


Is the dtype option normally applied in _engine.read? Just curious why it needs to be done here

Yes, but the DataFrame constructor infers object to string again if the option is set, which would discard the original dtype

OK makes sense.

Could we defer looping over col_dict if dtype isn't specified to be object-like?

Updated, only doing this now if we have a dict or object dtype

…tion

phofl · 2023-12-02T00:11:45Z

pandas/io/parsers/arrow_parser_wrapper.py

@@ -295,18 +295,8 @@ def read(self) -> DataFrame:
            dtype_mapping[pa.null()] = pd.Int64Dtype()
            frame = table.to_pandas(types_mapper=dtype_mapping.get)
        elif using_pyarrow_string_dtype():
-


These mappers don't work, arrow supports type -> type not column -> type

phofl · 2023-12-08T21:46:06Z

cc @mroeschke gentle ping

mroeschke · 2023-12-09T19:18:30Z

Thanks @phofl

jbrockmendel · 2023-12-17T20:08:53Z

pandas/io/parsers/readers.py

@@ -1846,7 +1853,40 @@ def read(self, nrows: int | None = None) -> DataFrame:
            else:
                new_rows = len(index)

-            df = DataFrame(col_dict, columns=columns, index=index)
+            if hasattr(self, "orig_options"):


can we do something more explicit than a hasattr check?

no, you can subclass the reader, so we don't have any control over it

does anybody actually do this? i judge those people, their ethics, and their hygiene.

That's something I can't answer, we might want to deprecate maybe, but we are stuck with hasattr here until then

phofl added 4 commits November 18, 2023 14:32

BUG: read_csv not respecting object dtype when option is set

91836bd

Merge branch 'main' into csv_dtype_string_option

f960b16

Update readers.py

3c946b3

Merge branch 'main' into csv_dtype_string_option

e93cfed

mroeschke added the IO CSV read_csv, to_csv label Nov 26, 2023

phofl added 3 commits November 26, 2023 22:37

Merge branch 'main' into csv_dtype_string_option

5665275

Cover str too

7f70503

Merge branch 'main' into csv_dtype_string_option

886dcc9

mroeschke reviewed Nov 30, 2023

View reviewed changes

phofl added 5 commits November 30, 2023 19:55

Merge branch 'main' into csv_dtype_string_option

02a5228

Adjust

867abce

Fixup

3031d0d

Merge remote-tracking branch 'upstream/main' into csv_dtype_string_op…

abcefc8

…tion

Fixup

51a367e

phofl commented Dec 2, 2023

View reviewed changes

phofl added 2 commits December 8, 2023 22:45

Update readers.py

d38b9eb

Merge branch 'main' into csv_dtype_string_option

000cd8e

mroeschke approved these changes Dec 9, 2023

View reviewed changes

mroeschke added this to the 2.2 milestone Dec 9, 2023

mroeschke merged commit fb05cc7 into pandas-dev:main Dec 9, 2023
44 checks passed

phofl deleted the csv_dtype_string_option branch December 9, 2023 19:39

jbrockmendel reviewed Dec 17, 2023

View reviewed changes

MarcoGorelli mentioned this pull request Jan 24, 2024

BUG: arguments dtype and parse_dates of read_csv does not work properly #57024

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: read_csv not respecting object dtype when option is set #56047

BUG: read_csv not respecting object dtype when option is set #56047

phofl commented Nov 18, 2023

phofl commented Nov 29, 2023

mroeschke Nov 30, 2023

phofl Nov 30, 2023

mroeschke Dec 1, 2023

phofl Dec 1, 2023

phofl Dec 2, 2023

phofl commented Dec 8, 2023

mroeschke commented Dec 9, 2023

jbrockmendel Dec 17, 2023

phofl Dec 17, 2023

jbrockmendel Dec 17, 2023

phofl Dec 17, 2023

BUG: read_csv not respecting object dtype when option is set #56047

BUG: read_csv not respecting object dtype when option is set #56047

Conversation

phofl commented Nov 18, 2023

phofl commented Nov 29, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

phofl commented Dec 8, 2023

mroeschke commented Dec 9, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment