Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade adbc_ingest parameter temporary to prefixes #2343

Open
Diadochokinetic opened this issue Dec 2, 2024 · 13 comments
Open

Upgrade adbc_ingest parameter temporary to prefixes #2343

Diadochokinetic opened this issue Dec 2, 2024 · 13 comments
Labels
Type: enhancement New feature or request

Comments

@Diadochokinetic
Copy link

Diadochokinetic commented Dec 2, 2024

What feature or improvement would you like to see?

Describe the use case

Currently adbc_ingest supports a parameter temporary to execute CREATE TEMPORARY TABLE statements. sqlalchemy supports this in a more abstract way, by accepting a list of table prefixes. I suggest to upgrade the API by implementing a prefixes parameter like sqlalchemy.Table.

Databases / Backends / Drivers targeted

Some DBMS support the use of multiple prefixes. E.g.
ibm db2: CREATE GLOBAL TEMPORARY TABLE
postgres: CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE
snowflake: CREATE [ { [ { LOCAL | GLOBAL } ] TEMP | TEMPORARY | VOLATILE | TRANSIENT } ] TABLE

Example Use

Create a global temporary table from some data.

with con.cursor() as cur:
    total_inserted = cur.adbc_ingest(
        table_name="TEST",
        data=some_data,
        mode="create",
        db_schema_name="SESSION",
        temporary=["GLOBAL", "TEMPORARY"],
    )

con.commit()

User can then use this "empty shell" to insert and further process data within their respective session.

Additional context

I'm currently working on adding a prefixes parameter to pandas to_sql() method, see pandas-dev/pandas#60409 for reference. Since to_sql() supports both sqlalchmy and adbc it would be beneficial for the ecosystem to have feature parity. The current state of progress:

  • I fully implemented prefixes for sqlalchemy drivers
  • I implemented a way to only use the TEMPORARY keyword from prefixes for adbc drivers
@lidavidm
Copy link
Member

lidavidm commented Dec 2, 2024

I added this to the spec milestone but I think we can also just add out-of-band parameter definitions as we have in the past to handle this

@lidavidm
Copy link
Member

lidavidm commented Dec 2, 2024

Is there anything else you think may be missing that we can tackle while we're at it?

@WillAyd
Copy link
Contributor

WillAyd commented Dec 3, 2024

@lidavidm what do you mean by adding an out-of-band parameter? In the case of the OP just adding a global= parameter?

@lidavidm
Copy link
Member

lidavidm commented Dec 3, 2024

Oh, I mean just adding a parameter that the drivers agree on by convention without going through the full vote (and we can roll it up into the next vote)

@WillAyd
Copy link
Contributor

WillAyd commented Dec 3, 2024

Ah I see. @Diadochokinetic open to any contribution you'd like to make here. I should have some time to explore that in more detail later this week if needed

@lidavidm
Copy link
Member

lidavidm commented Dec 3, 2024

That said, looking at the original issue: for ADBC, the idea would be that temporary=True would automatically (in a hypothetical DB2 driver) generate CREATE GLOBAL TEMPORARY TABLE, instead of requiring the user to know about this. Are there prefixes where this needs to be under the user's control?

@Diadochokinetic
Copy link
Author

db2 offers both: CREATE TEMPORARY TABLE and CREATE GLOBAL TEMPORARY TABLE. The first one only exists within a session, the second one exists forever (or until dropped), but the data is only available within a session. Depending on the use case, you need to pass either ["TEMPORARY"] or ["GLOBAL", "TEMPORARY"].

@lidavidm
Copy link
Member

lidavidm commented Dec 3, 2024

Ah, ok. I suppose having the option can be useful, but I'm hesitant to actually do so until we support databases that need it (I don't think any of our current backends do?)

@lidavidm
Copy link
Member

lidavidm commented Dec 3, 2024

Partly I'm hesitant to effectively bake in a SQLAlchemy option, though having this functionality in the first place opens us up to that discussion, so I'm not necessarily against it, I suppose I'd just like to see other examples, or debate whether it needs to be an option fully in the spec vs. an option that only a few drivers support (I suppose the latter complicates use on your side, though)

@Diadochokinetic
Copy link
Author

I checked the docs of some of the currently supported databases and there are a couple, e.g. postgres and snowflake, that support multiple prefixes. I edited my initial post and added some links to the respective docs.

@lidavidm
Copy link
Member

lidavidm commented Dec 4, 2024

Got it, thanks. Hmm, it seems for DB2 we would just treat temporary as equal to GLOBAL TEMPORARY by default, since it doesn't seem to have any other option; for Postgres and Snowflake global/local/volatile are just aliases to ease porting from other systems; unlogged/transient do actually do something.

@paleolimbot @zeroshade any thoughts?

@zeroshade
Copy link
Member

I agree with @lidavidm for the most part here, I'm hesitant to bake the SQLAlchemy part of the option in unless it is a more widely utilized feature. Until we see more systems leveraging something like this it might be better to simply have GLOBAL vs not GLOBAL be a separate, DB2 specific option. If we see in the future more systems supporting this, then it can make sense to make a more formal change at that point to formalize the PREFIXES style then rather than making this change now. That's my two cents at least.

@paleolimbot
Copy link
Member

One of the ways that I've worked around limitations of the ingest in the past is to do something like:

  • Execute CREATE WHATEVER WHATEVER TABLE XXX
  • Ingest using append mode (maybe this is still a problem if we need to resolve the name of the table in some special way)

...but I don't know what kind of limitations that incurs with respect to drivers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants