Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Include df.attrs in to_csv output #53577

Open
1 of 3 tasks
canthonyscott opened this issue Jun 9, 2023 · 5 comments
Open
1 of 3 tasks

ENH: Include df.attrs in to_csv output #53577

canthonyscott opened this issue Jun 9, 2023 · 5 comments
Labels
Enhancement IO CSV read_csv, to_csv

Comments

@canthonyscott
Copy link

canthonyscott commented Jun 9, 2023

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

There are many use cases (especially in the scientific community) where the best/only course of action is to enable to embedding of configuration parameters and/or other metadata into the beginning of a CSV file itself. These are typically prefaced with some comment-indication prefix such as #. This maintains human readability while attaching the metadata to the generated file itself.

Pandas' read_csv method already implements a feature to read such files and ignore these lines when parsing the the data into a dataframe. This new feature would implements the complement of this feature. It allows users to write these metadata and/or comment lines in their CSV outputs as well.

This could be accomplished file handlers (thanks @twoertwein)

with open("test.csv", mode="wt") as handle:
    handle.write(comments)
    dataframe.to_csv(handle)

However, adding the comment param to the to_csv would better match to read_csv method.

Feature Description

A new function would be implemented to write commend lines using the csv writer

def _save_comment_lines(self) -> None:
    if self.comment_lines:
        for line in self.comment_lines:
            self.writer.writerow([f"{self.comment}" + line])

This could then be called in the _save method

def _save(self) -> None:
        if self.comment:  # Addition here
            self._save_comment_lines()  # Addition here
        if self._need_to_save_header:
            self._save_header()
        self._save_body()

Alternative Solutions

Technically, using the file handlers method mentioned in the above would satisfy this feature request. However, it could be more logical for users to find if it mirrored the read_csv API.

An alternative, more complex, but perhaps more flexible solution could be to store the comment lines in the DataFrame object itself with a flag to automatically write those comment lines when to_csv is called. This way when to_csv is called the comments would be guaranteed to write. This would ensure the comments would be written in systems where the DataFrame writing to disk mechanism is abstracted away from the users code. This exists in situations where the pandas/python code is being run my a job submission/scheduling system.

Additional Context

I was a little exited and already created a PR for this feature. #53569

Apologies! I should have started here first. I am happy to close or modify it as needed.

@canthonyscott canthonyscott added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 9, 2023
@topper-123 topper-123 added the IO CSV read_csv, to_csv label Jun 11, 2023
@topper-123
Copy link
Contributor

Can this be done by writing the DataFrame.attrs values to the csv file? There is a issue for that, but for JSON in #51012.

@topper-123 topper-123 added Needs Discussion Requires discussion from core team before further action and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 11, 2023
@canthonyscott
Copy link
Author

I believe adding this metadata and storing it in DataFrame.attrs would totally work. I like that this would add the flexibility to write the data out to any other formats that are supported (json for example in the linked issue).

It sounds like using DataFrame.attrs is pretty much what I was dancing around in my alternative solution but without knowing exactly what it was called.

@topper-123
Copy link
Contributor

topper-123 commented Jun 12, 2023

Great. IMO it would make sense to have functionality that can read attrs in the readers where it makes sense.

EDIT: I've changed the issue title to reflect that this issue has been changed to be about writing attrs metadata to csv.

@topper-123 topper-123 removed the Needs Discussion Requires discussion from core team before further action label Jun 12, 2023
@topper-123 topper-123 changed the title ENH: Allow to_csv to write file comments at the top, mirroring the read_csv comment API ENH: Include df.attrs in to_csv output Jun 12, 2023
@hamdav
Copy link

hamdav commented Nov 17, 2023

Looks like this was pretty much completed but never merged? What needs to be done to make it happen? I would love to have this feature!

@canthonyscott
Copy link
Author

I would be happy to update and re-open my PR if there is interest in having it merged in

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO CSV read_csv, to_csv
Projects
None yet
3 participants