Unable to configure a connection with duplicate streams from different namespaces #105

Grintas · 2024-05-20T14:17:42Z

I'm configuring a connection with Terraform from Postgres to S3. The database has multiple identical schemas (namespaces), so the table names (stream_names) are not unique:

- postgres_db
    - schema_1
        - table_1
        - table_2
    - schema_2
        - table_1
        - table_2

In the destination config I have s3_path_format set as ${STREAM_NAME}/${NAMESPACE}/${YEAR}_${MONTH}_${DAY}_, so the data in S3 would be partitioned by the source schema.
However, I cannot configure the connection resource properly since the configurations.streams block does not have a schema/namespace attribute. If I include just one entry for table_1 like this:

 configurations = {
    streams = [
        {
            name = "table_1"
            cursor_field = ["_ab_cdc_lsn"]
            sync_mode = "incremental_append"
        }
    ]
}

then once deployed the connector has just one stream from one of the source schemas.
When I enabled two streams for table_1 in the UI and ran the apply again, the plan showed me this:

Terraform will perform the following actions:

  # airbyte_connection.rds_to_s3 will be updated in-place
  ~ resource "airbyte_connection" "rds_to_s3" {
      ~ configurations                       = {
          ~ streams = [
              - {
                  - cursor_field = [
                      - "_ab_cdc_lsn",
                    ] -> null
                  - name         = "table_1" -> null
                  - primary_key  = [
                      - [
                          - "id",
                        ],
                    ] -> null
                  - sync_mode    = "incremental_append" -> null
                },
                # (1 unchanged element hidden)
            ]
        }
        # (9 unchanged attributes hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

So I added a duplicate stream for table_1 and then the plan passed, but apply failed with an error
Status 400 │ {"detail":"The body of the request contains an invalid connection configuration. Duplicate stream found in configuration for: │ table_1.","type":"https://reference.airbyte.com/reference/errors","title":"bad-request","status":400}

How can I address this? Did I miss any configuration in the docs or is this not supported?

The text was updated successfully, but these errors were encountered:

lugonthier · 2024-06-12T09:48:00Z

I had the exact same problem with Snowflake to BigQuery where the snowflake schema is the namespace.

In my case, the workaround I found is to create a source per namespace by specifying the schema param in the airbyte_source_snowflake. Therefore create one connection per source..

It makes the provider mostly unusable at scale.

Grintas · 2024-06-12T11:04:16Z

I had the exact same problem with Snowflake to BigQuery where the snowflake schema is the namespace.

In my case, the workaround I found is to create a source per namespace by specifying the schema param in the airbyte_source_snowflake. Therefore create one connection per source..

It makes the provider mostly unusable at scale.

Yep, we came to the same conclusion. But some of our source databases have 100+ schemas, and if I'm not mistaken, creating a connection per schema requires 100 replication slots, which essentially kills the source cluster.

jasonmaddernstudylink · 2024-06-23T22:05:02Z

I’ve hit the exact same problem as I have circa 50 identical schemas in Postgres to sync :(

jmaddern-fw · 2024-09-27T05:32:09Z

I would like to add that this was doable through Octavia (as I had many identical Postgres streams all being synced through the one pipeline). Yes, the yaml file had some repetition but it was actionable.

Is there any plans to introduce this functionality?

gingeard · 2024-10-19T09:09:09Z

It seems that there is a limitation of public-api.
While the deprecated Configuration API (server-api) has the field "namespace" included to the "stream" object:

https://airbyte-public-api-docs.s3.us-east-2.amazonaws.com/rapidoc-api-docs.html#post-/v1/connections/create

{
  ...
  "syncCatalog": {
    "streams": [
      {
        "stream": {
          "name": "...",
          "jsonSchema": {...},
          ...,
          "namespace": "string",

        },
        "config": {
          ...
  },
  ...
}

and you're able to send it to the backend. For example, in the browser, I can see that the POST payload to the:
http://AIRBYTE_WEBAPP/api/v1/web_backend/connections/create

looks like this:

At the same time, the public API has no same parameters for that:
https://reference.airbyte.com/reference/createconnection

The object "configurations[].streams[]" has "name", "syncMode", "cursorField", "primaryKey" and selectedFields parameters only. Have no idea why the public-api is cut compared with server-api.

Have to submit this issue to the Airbyte Platform and its public-api functionality.

jmaddern-fw · 2024-10-21T06:24:04Z

Thanks so much for your help, and clear response, @gingeard

Ticket raised with Airbyte: airbytehq/airbyte#47140

Grintas changed the title ~~The body of the request contains an invalid connection configuration. Duplicate stream found in configuration for <stream_name>~~ Unable to configure a connection with duplicate streams from different namespaces May 20, 2024

Grintas closed this as completed Jun 12, 2024

Grintas reopened this Jun 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to configure a connection with duplicate streams from different namespaces #105

Unable to configure a connection with duplicate streams from different namespaces #105

Grintas commented May 20, 2024 •

edited

Loading

lugonthier commented Jun 12, 2024

Grintas commented Jun 12, 2024

jasonmaddernstudylink commented Jun 23, 2024

jmaddern-fw commented Sep 27, 2024

gingeard commented Oct 19, 2024 •

edited

Loading

jmaddern-fw commented Oct 21, 2024

Unable to configure a connection with duplicate streams from different namespaces #105

Unable to configure a connection with duplicate streams from different namespaces #105

Comments

Grintas commented May 20, 2024 • edited Loading

lugonthier commented Jun 12, 2024

Grintas commented Jun 12, 2024

jasonmaddernstudylink commented Jun 23, 2024

jmaddern-fw commented Sep 27, 2024

gingeard commented Oct 19, 2024 • edited Loading

jmaddern-fw commented Oct 21, 2024

Grintas commented May 20, 2024 •

edited

Loading

gingeard commented Oct 19, 2024 •

edited

Loading