Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add support to read and write Amazon ION files #55725

Open
1 task done
anna-geller opened this issue Oct 27, 2023 · 1 comment
Open
1 task done

ENH: Add support to read and write Amazon ION files #55725

anna-geller opened this issue Oct 27, 2023 · 1 comment
Labels
Enhancement IO Data IO issues that don't fit into a more specific label

Comments

@anna-geller
Copy link

anna-geller commented Oct 27, 2023

Feature Type

  • Adding new functionality to pandas

Problem Description

We heavily rely on Amazon ION file format. Currently, reading ION files as Pandas dataframes requires workarounds.

Feature Description

It would be great to add support for ION in pandas using read_ion and write_ion methods.

Alternative Solutions

Here is a reproducer of a workaround we use for now:

import amazon.ion.simpleion as ion
from amazon.ion.simple_types import IonPyNull
import pandas as pd
import requests


def convert_ion_nulls(value):
    return None if isinstance(value, IonPyNull) else value


url = "https://huggingface.co/datasets/kestra/datasets/resolve/main/ion/employees.ion"
response = requests.get(url)
response.raise_for_status()
ion_content = response.content
ion_data = ion.loads(ion_content, single_value=False)
list_of_dicts = [dict(record) for record in ion_data]
list_of_dicts = [
    {k: convert_ion_nulls(v) for k, v in record.items()} for record in list_of_dicts
]
df = pd.DataFrame(list_of_dicts)

For writing files:

import amazon.ion.simpleion as ion

list_of_values = df.to_dict("records")


def save_as_ion(dict_or_list, file_name):
    with open(file_name, "wb") as f:
        ion.dump(dict_or_list, f)

save_as_ion(list_of_values, "mydata.ion")
@anna-geller anna-geller added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 27, 2023
@mroeschke
Copy link
Member

The code snippet seems fairly small enough to not need to be maintained directly in pandas so I would be -1 on this proposal. If you or someone else developed a 3rd party library to wrap that code snippet, we'd happily include it in our ecosystem docs

@jbrockmendel jbrockmendel added the IO Data IO issues that don't fit into a more specific label label Nov 1, 2023
@lithomas1 lithomas1 removed the Needs Triage Issue that has not been reviewed by a pandas team member label Jan 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

No branches or pull requests

4 participants