Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source Intercom: Fix incremental sync state issues #49936

Open
wants to merge 29 commits into
base: master
Choose a base branch
from

Conversation

btkcodedev
Copy link
Collaborator

What

Source Intercom: Migrate to manifest only format with components

@btkcodedev btkcodedev self-assigned this Dec 19, 2024
Copy link

vercel bot commented Dec 19, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Dec 19, 2024 7:22pm

@octavia-squidington-iv octavia-squidington-iv requested a review from a team December 19, 2024 06:01
@btkcodedev btkcodedev changed the title Source Intercom: Migrate to manifest only format with components Source Intercom: Fix incremental sync state issues Dec 19, 2024
@btkcodedev
Copy link
Collaborator Author

One update: I've tried to use older custom substream slicer, but the connection check is failing, so its a bummer

@btkcodedev
Copy link
Collaborator Author

I'll try different combinations and base versions

@btkcodedev
Copy link
Collaborator Author

Update: I tweaked with partition_id of parent and it was kind of success when tested with local
image

@btkcodedev
Copy link
Collaborator Author

btkcodedev commented Dec 19, 2024

@ChristoGrab Can you come again for those state messages?
Docker read, doesn't give any state messages right? Or am I missing something?
docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/integration_tests:/integration_tests airbyte/source-intercom:dev read --config /secrets/config.json --catalog /integration_tests/incremental_catalog.json

@ChristoGrab
Copy link
Contributor

ChristoGrab commented Dec 19, 2024

@btkcodedev State messages do get logged when running with docker, they'll show up at the end of each stream's record output in your terminal. Easy way to find it is by searching for "type":"STATE" in the connector's output. So for conversation_parts, we get:

{
  "type":"STATE",
  "state":{
    "type":"STREAM",
    "stream":{
      "stream_descriptor":{
        "name":"conversation_parts"
      },
      "stream_state":{
        "states":[
          {"partition":{"id":"1", "parent_slice":{}}, "cursor":{"updated_at":7626086649}},
          {"partition":{"id":"60","parent_slice":{}},"cursor":{"updated_at":7626086649}},
          {"partition":{"id":"61","parent_slice":{}},"cursor":{"updated_at":7626086649}},
          {"partition":{"id":"63","parent_slice":{}},"cursor":{"updated_at":7626086649}},
          {"partition":{"id":"64","parent_slice":{}},"cursor":{"updated_at":7626086649}},
          {"partition":{"id":"65","parent_slice":{}},"cursor":{"updated_at":7626086649}},
          {"partition":{"id":"66","parent_slice":{}},"cursor":{"updated_at":7626086649}},
          {"partition":{"id":"67","parent_slice":{}},"cursor":{"updated_at":7626086649}},
          {"partition":{"id":"68","parent_slice":{}},"cursor":{"updated_at":7626086649}},
          {"partition":{"id":"69","parent_slice":{}},"cursor":{"updated_at":7626086649}},
          {"partition":{"id":"59","parent_slice":{}},"cursor":{"updated_at":7626086649}}
        ]
      }
    }
  }
}

What's interesting here is that conversation_parts does use per-partition states (as in, a separate cursor value is tracked for every partition/parent slice). Haven't dug into the code, but I presume this was added here: #46658.

However, the existing state object in on our sandbox connector looks closer to the format used in our abnormal_state test, where we track the parent state and a single value for the whole of conversation_parts:

  {
    "type": "STREAM",
    "stream": {
      "stream_descriptor": {
        "name": "conversation_parts",
        "namespace": null
      },
      "stream_state": {
        "updated_at": 1689067774,
        "prior_state": {
          "updated_at": 1689067774,
          "prior_state": {
            "updated_at": 1689067774,
            "conversations": {
              "updated_at": 1689068230
            }
          },
          "conversations": {
            "updated_at": 1689068230
          }
        },
        "conversations": {
          "updated_at": 1689068230
        }
      }
    }
  }

This is likely due to the custom component altering the way state is tracked, but where things get confusing for me is the interaction of per-partition state with the custom component, and why there was a regression in the live connection that I tested with progressive rollout that didn't match up with the success of the CAT tests... I'm running another round of tests on your latest changes, will let you know soon as I have more insights.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/source/intercom
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants