-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pandas simplification v2 #826
base: main
Are you sure you want to change the base?
Conversation
From previous PR: This PR looks to improve the handling for converting a topology to a dataframe. This currently lives as a method for topology. It is now being moved to a convert_dataframe.py module. A few different formats are available which give some nice default ways to view a topology. Notably, we have the formats: There is also an added function that allows you to generate dataframes that cover the parameters for a set of topologies. Finally, there will be some function that prints the dataframes with the rdkit mols which are labeled to match the dataframes. TODO Checklist:
|
From discussion with @Vtsoch, there are a few more general use cases that would be nice to have working in the arguments for the main function, to_dataframeDict. dfDict = to_dataframeDict(ptop, parameters="sites", columns=["name", "atom_type.name", "atom_type.parameters", "charge", "molecule.name"], format="remove_duplicates") should be put as an example, since it could be hard to find the molecule info or parameters info if you don't know how the parsing works of these attributes. I would even consider these attributes to be in the dfDict = to_dataframeDict(ptop, parameters=["sites", "bonds"], columns=["name"], format="specific_columns") This will fail currently. However, I think it should just go through the columns and only grab attributes that exist, skipping the others. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #826 +/- ##
==========================================
- Coverage 94.07% 93.31% -0.77%
==========================================
Files 65 66 +1
Lines 6953 7088 +135
==========================================
+ Hits 6541 6614 +73
- Misses 412 474 +62 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple comments.
format: str = "default", | ||
columns: list[str] = None, | ||
handle_unyts: str = "to_headers", | ||
) -> pd.DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be dict
correct?
``` | ||
|
||
|
||
>>> gmso.external.convert_dataframe.to_dataframeDict(ptop, parameters='sites', columns=['charge'], handle_unyts="to_headers") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Zeerakkhan47 I think you could expand the information passed into the columns
parameter here to include some of the things @CalCraven mention in his comment on June 16th.
This is a replacement PR for the features requested in #814, which has some commits in the merge history that deviated when switching to ruff linting.