-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: allow lists/sets in fillna #21329
Comments
@h-vetinari first class list/set support would require an extension type from the community. as written these are non-idiomatic and non-performant, as well as a headache for indexing. you are much better off NOT using things like this. what you are asking for is pandas to infer even more that it already does. this is not likely to happen. its already way too magical. |
@jreback How is |
FWIW, when I refactored that code to share it w/ categorical, I couldn't really figure out why we were limiting things to dicts. My best guess is it's because a potential ambiguity between whether the sequence should be "elementwise" (use |
Thanks for the input. I think this ambiguity is not so serious. First off Furthermore, since lists throw errors currently, allowing this would not break any existing code. I think it would be very useful... |
Another issue for us is that it is unclear how to implement it efficiently with a custom function. So, supporting filling collections (lists, sets, dicts) actually also resolves performance issue here. Otherwise we are using such a func, which is likely to be slower, than potential standard implementation:
We have so many cases when we needed to do |
Slightly different from OP's use-case, but I wanted to |
I often have to deal with data where a table column contains a string with comma-separated values or NAs. So having a fix for this issue would be highly appreciated! I painfully realized that
throws an error and
returns a list with an empty string but not an empty list. :( So I ended up with
which is not pretty but does the job. I don't know if this has a decent performance - and I don't really care because all I need is to declare several data transformation in a chained fashion where each transformation can be easily expressed on a single line (without defining several other helper functions or whatsoever). |
Here is my ugly but performant hack to col_name = 'follower'
df.loc[df[col_name].isna(), col_name] = np.empty((df[col_name].isna().sum(), 0)).tolist() Or as a function: def fill_nan_with_empty_list(df, col):
df.loc[df[col].isna(),col] = np.empty((df[col].isna().sum(), 0)).tolist()
fill_nan_with_empty_list(users, 'follower')
fill_nan_with_empty_list(users, 'following') though it doesn't always work... |
The docs for the
value
-parameter offillna
say (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html#pandas.DataFrame.fillna)Frankly, I do not understand this limitation, especially because I see no way to interpret it ambiguously. There are several usecases (that I keep encountering) for filling in lists - especially empty lists.
One of the main ones in using
.str.split(..., expand=False)
orstr.extract
and wanting to keep processing the lists - e.g. turn them into sets:It's tedious to always do this by hand (esp. for
DataFrame
), likeThis is also somewhat related to #19266.
The text was updated successfully, but these errors were encountered: