DOC: na_values defaults for read_csv don't match STR_NA_VALUES correctly #55803

johnmreynolds · 2023-11-02T16:47:39Z

Pandas version checks

I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

Documentation problem

In https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html the list of na_values is supposed to match the list in STR_NA_VALUES.

However, in d4889bc the change to double quotes broke the first one, which is now incorrectly displayed as " ", i.e. a single space, rather than the correct "".

Curiously, read_excel still shows the first value as ‘’, so it hasn't had the change to double quotes.

Suggested fix for documentation

I think this bug got introduced due to adding a space so the triple double-quote could correctly end the string.

One fix could be to change the line:

NaN: " """

to something like

NaN: """ + '"'

The text was updated successfully, but these errors were encountered:

johnmreynolds · 2023-11-02T16:50:06Z

Or indeed rewrite the joined strings to include the quotes so there is less fiddling to do to get the start and end quotes right.

steve-mavens · 2023-11-02T16:51:52Z

NaN: \"""" should also work. Note there is a corresponding error at the end of the list of values: the final one should be "null" but is shown as "null ".

johnmreynolds · 2023-11-02T17:00:12Z

Also, although it doesn't show with the default font used, the closing quote in the " " is wrong.

The html source appears as:

: “ “, “#N/A”, “#N/A N/A”, “#NA”, “-1.#IND”, “-1.#QNAN”, “-NaN”, “-nan”,
“1.#IND”, “1.#QNAN”, “<NA>”, “N/A”, “NA”, “NULL”, “NaN”, “None”,
“n/a”, “nan”, “null “.

which again doesn't show in the github font, but if you copy and paste it, you can see the fifth character is the wrong quote character, as is the last but one.

This is probably an issue with whatever is generating the smart quotes. Hopefully this will go away if the spurious spaces are fixed.

johnmreynolds · 2023-11-02T17:03:32Z

In reference to the previous comment, something like:

fill(', '.join(f'"{value}"' for value in sorted(STR_NA_VALUES)), 70, subsequent_indent=" ")

should quote the values correctly.

johnmreynolds · 2023-11-02T17:31:51Z

And a note that while this is a fairly trivial bug, STR_NA_VALUES as far as I can tell isn't documented, so if you want to remove a value from the default na_values list, the only documented way to do this is to copy the default values from the documentation.

We've run into this, as we're implementing a workaround for the Pandas 2.0 non-backwards-compatible change of adding 'None' to the na_values default.

johnmreynolds added Docs Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 2, 2023

CYHSM mentioned this issue Jan 17, 2024

BUG: KeyError when loading csv with NaNs #56929

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: na_values defaults for read_csv don't match STR_NA_VALUES correctly #55803

DOC: na_values defaults for read_csv don't match STR_NA_VALUES correctly #55803

johnmreynolds commented Nov 2, 2023

johnmreynolds commented Nov 2, 2023

steve-mavens commented Nov 2, 2023

johnmreynolds commented Nov 2, 2023

johnmreynolds commented Nov 2, 2023

johnmreynolds commented Nov 2, 2023

DOC: na_values defaults for read_csv don't match STR_NA_VALUES correctly #55803

DOC: na_values defaults for read_csv don't match STR_NA_VALUES correctly #55803

Comments

johnmreynolds commented Nov 2, 2023

Pandas version checks

Location of the documentation

Documentation problem

Suggested fix for documentation

johnmreynolds commented Nov 2, 2023

steve-mavens commented Nov 2, 2023

johnmreynolds commented Nov 2, 2023

johnmreynolds commented Nov 2, 2023

johnmreynolds commented Nov 2, 2023