Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyArrow backend cannot handle MySQL set type #2089

Open
karakanb opened this issue Nov 23, 2024 · 0 comments · May be fixed by #2090
Open

PyArrow backend cannot handle MySQL set type #2089

karakanb opened this issue Nov 23, 2024 · 0 comments · May be fixed by #2090

Comments

@karakanb
Copy link

dlt version

1.4.0

Describe the problem

When ingesting tables that contain set fields in MySQL tables dlt fails to convert them to arrow, due to the type not being supported by PyArrow.

  File "/path/dlt/common/libs/pyarrow.py", line 685, in row_tuples_to_arrow
    return pa.Table.from_pydict(columnar_known_types, schema=arrow_schema)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow/table.pxi", line 1920, in pyarrow.lib._Tabular.from_pydict
  File "pyarrow/table.pxi", line 6153, in pyarrow.lib._from_pydict
  File "pyarrow/array.pxi", line 398, in pyarrow.lib.asarray
  File "pyarrow/array.pxi", line 358, in pyarrow.lib.array
  File "pyarrow/array.pxi", line 85, in pyarrow.lib._ndarray_to_array
  File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowTypeError: Expected bytes, got a 'set' object

The same problem exists for lists and dicts as well, but the corresponding code handles those by casting them to string. It seems like set types are missed.

Expected behavior

MySQL set fields are correctly ingested.

Steps to reproduce

try ingesting a table that contains a set using pyarrow backend

create table test.some_table
(
    order_id int auto_increment primary key,
    col1     set ('0', '1') default '0' not null,
    col2     set ('1', '2') default '2' not null
)

Operating system

Linux, macOS, Windows

Runtime environment

Local

Python version

3.11

dlt data source

sql_table

dlt destination

Google BigQuery, DuckDB, Filesystem & buckets, Postgres, Amazon Redshift, Snowflake

Other deployment details

No response

Additional information

the only workaround at the moment is not using pyarrow. I am submitting a fix for this at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

Successfully merging a pull request may close this issue.

1 participant