Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Source Connector: DuckDB 🦆 #31

Open
aaronsteers opened this issue Jun 14, 2024 · 8 comments
Open

New Source Connector: DuckDB 🦆 #31

aaronsteers opened this issue Jun 14, 2024 · 8 comments
Assignees

Comments

@aaronsteers
Copy link

aaronsteers commented Jun 14, 2024

Overview

We do not yet have a DuckDB source connector. Normally, DuckDB database are local files and not very useful as sources, but now they can also be remote (e.g. MotherDuck) and they can be a pass-through for other datasource (e.g. #30 and the Hugging Face Datasets).

Technical spec

You would write a new source connector which can connect to a (remote) DuckDB dataset or database, and emit records from DuckDB, allowing Airbyte users to send these to any Airbyte destination.

Notes:

  • We do have a DuckDB Destination and a PyAirbyte Cache and SQLProcessor.
  • It is not obvious how (or if) incremental processing should be handled for DuckDB sources. Whoever pick this task should plan to propose a path forward for this during development.

Definition of Done

  • You would build a new "DuckDB" source in Python (reusing code if helpful).
  • If primary keys exist, they should be registered in the catalog.
  • You should use the CDK as much as possible.
  • The connector should pass integration tests and acceptance tests.
@aaronsteers
Copy link
Author

Assigning myself in order to reserve/hold for @ombhardwajj, who has the related #30.

@ombhardwajj
Copy link

Hey @aaronsteers ,you can assign it to me now!

@marcosmarxm
Copy link
Member

@ombhardwajj what is the status of this issue?

@ombhardwajj
Copy link

@marcosmarxm Its already been a week since I am working on the Hugging Face Datasets connector. Given the time constraint of this hackathon, I don't think i'll be able to build this DuckDB connector. Hence I am un-assigning myself.

@ombhardwajj ombhardwajj removed their assignment Jun 26, 2024
@bala-ceg
Copy link

issue #30 is related to issue #31, can you please assign this to me as well?

@bala-ceg
Copy link

bala-ceg commented Jul 1, 2024

@marcosmarxm @aaronsteers can you please let me know which connector development method i should follow - python cdk or lowcode cdk

@marcosmarxm
Copy link
Member

@bala-ceg Probably you'll need to use Python CDK as the stream are going to be dynamically created

@bala-ceg
Copy link

bala-ceg commented Jul 1, 2024

@marcosmarxm is there any DB based python CDK that is written previously? I would like to see that as reference

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants