Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: na_values defaults for read_csv don't match STR_NA_VALUES correctly #55803

Open
1 task done
johnmreynolds opened this issue Nov 2, 2023 · 5 comments
Open
1 task done
Labels
Docs Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@johnmreynolds
Copy link

Pandas version checks

  • I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

Documentation problem

In https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html the list of na_values is supposed to match the list in STR_NA_VALUES.

However, in d4889bc the change to double quotes broke the first one, which is now incorrectly displayed as " ", i.e. a single space, rather than the correct "".

Curiously, read_excel still shows the first value as ‘’, so it hasn't had the change to double quotes.

Suggested fix for documentation

I think this bug got introduced due to adding a space so the triple double-quote could correctly end the string.

One fix could be to change the line:

NaN: " """

to something like

NaN: """ + '"'

@johnmreynolds johnmreynolds added Docs Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 2, 2023
@johnmreynolds
Copy link
Author

Or indeed rewrite the joined strings to include the quotes so there is less fiddling to do to get the start and end quotes right.

@steve-mavens
Copy link

NaN: \"""" should also work. Note there is a corresponding error at the end of the list of values: the final one should be "null" but is shown as "null ".

@johnmreynolds
Copy link
Author

Also, although it doesn't show with the default font used, the closing quote in the " " is wrong.

The html source appears as:

: “ “, “#N/A”, “#N/A N/A”, “#NA”, “-1.#IND”, “-1.#QNAN”, “-NaN”, “-nan”,
“1.#IND”, “1.#QNAN”, “<NA>”, “N/A”, “NA”, “NULL”, “NaN”, “None”,
“n/a”, “nan”, “null “.

which again doesn't show in the github font, but if you copy and paste it, you can see the fifth character is the wrong quote character, as is the last but one.

This is probably an issue with whatever is generating the smart quotes. Hopefully this will go away if the spurious spaces are fixed.

@johnmreynolds
Copy link
Author

In reference to the previous comment, something like:

fill(', '.join(f'"{value}"' for value in sorted(STR_NA_VALUES)), 70, subsequent_indent=" ")

should quote the values correctly.

@johnmreynolds
Copy link
Author

And a note that while this is a fairly trivial bug, STR_NA_VALUES as far as I can tell isn't documented, so if you want to remove a value from the default na_values list, the only documented way to do this is to copy the default values from the documentation.

We've run into this, as we're implementing a workaround for the Pandas 2.0 non-backwards-compatible change of adding 'None' to the na_values default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

2 participants