Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use default delimiter to flatten columns #330

Merged
merged 3 commits into from
Aug 21, 2024
Merged

use default delimiter to flatten columns #330

merged 3 commits into from
Aug 21, 2024

Conversation

shcheklein
Copy link
Member

@shcheklein shcheklein commented Aug 20, 2024

Fixes #328

to_pandas(flatten=True) now returns columns in the same way as to_records and in the same way we store them in the DB (to fix the bug and for consistency)

changing this use .. It is simpler for end users and we don't want them to see DB details (at least it's not how our API operates atm).

@shcheklein shcheklein self-assigned this Aug 20, 2024
@shcheklein shcheklein added the bug Something isn't working label Aug 20, 2024
Copy link

cloudflare-workers-and-pages bot commented Aug 20, 2024

Deploying datachain-documentation with  Cloudflare Pages  Cloudflare Pages

Latest commit: 7164839
Status: ✅  Deploy successful!
Preview URL: https://71fe28a1.datachain-documentation.pages.dev
Branch Preview URL: https://fix-328.datachain-documentation.pages.dev

View logs

@shcheklein
Copy link
Member Author

Probably not the right way to do this, tbh. I'll change it back (and replace __ with .). @iterative/datachain what was the reason to use __ instead of a . on the DB level? We now have two ways of interacting with names (even within public API - e.g. to_records).

Copy link

codecov bot commented Aug 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.88%. Comparing base (06fdd8c) to head (7164839).
Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #330      +/-   ##
==========================================
- Coverage   86.94%   86.88%   -0.07%     
==========================================
  Files          90       90              
  Lines        9898     9896       -2     
  Branches     1995     1994       -1     
==========================================
- Hits         8606     8598       -8     
- Misses        944      949       +5     
- Partials      348      349       +1     
Flag Coverage Δ
datachain 86.81% <100.00%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

columns = []
if headers:
columns = [".".join(filter(None, header)) for header in headers]
return pd.DataFrame.from_records(self.to_records(), columns=columns)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the problem was this line (passing data as a dict and then assigning different column names)? Is the rest refactoring? Just want to make sure I understand the bug and the fix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes!

@shcheklein shcheklein merged commit 35bef8a into main Aug 21, 2024
38 checks passed
@shcheklein shcheklein deleted the fix-328 branch August 21, 2024 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Method .to_pandas(flatten=True) returns DataFrame of NaN values
3 participants