[source-google-ads] Timeout issue with large datasets in shopping_performance_view #49254

alenoir · 2024-12-12T13:21:25Z

Connector Name

source-google-ads

Connector Version

3.7.9

What step the error happened?

During the sync

Relevant information

The job fails due to a timeout in the request_records_job method, which is limited to 5 minutes via the @detached(timeout_minutes=5) decorator. Despite sufficient resources allocated to the pod, the operation does not complete within the timeout, likely due to the large volume of data or slow API responses.

Steps to Reproduce:

Configure a Google Ads source with access to the shopping_performance_view stream.
Set up a sync to any destination (e.g., BigQuery).
Run the sync with a large dataset (e.g., multiple months of data).
Observe the timeout after 5 minutes.

Error Message:

TimeoutError: request_records_job exceeded timeout of 5 minutes.

Pod Metrics:

POD                               NAME           CPU(cores)   MEMORY(bytes)   
replication-job-12831-attempt-0   destination    7m           389Mi           
replication-job-12831-attempt-0   orchestrator   6m           482Mi           
replication-job-12831-attempt-0   source         14m          103Mi

Additional Information:

Airbyte Version: 1.2.0
Deployment Method: Kubernetes
Google Ads API Stream: shopping_performance_view
Environment: GCP Kubernetes Engine

Feature Request/Question:

Is it possible to make the timeout_minutes parameter for the @detached decorator configurable?
Are there any known strategies or optimizations to handle large datasets with this connector?
Could retries or chunked processing be implemented for long-running operations?

Relevant log output

2024-12-12 09:58:57 source > Caught retryable error Method 'request_records_job' timed out after 5.0 minutes after 1 tries. Waiting 1 seconds then retrying...

Contribute

Yes, I want to contribute

The text was updated successfully, but these errors were encountered:

marcosmarxm · 2024-12-16T19:06:07Z

@alenoir would be possible to you to build the connector locally increasing the value and check if you're able to get the data? If this is the fix we can work later to implement a parameter to configure the timeout.

alenoir · 2024-12-17T14:26:38Z

Thanks for the suggestion!

I performed a few tests locally to address the timeout issue:

Increased the timeout to 10 minutes:
Unfortunately, this did not resolve the issue. The query still failed when fetching data from the shopping_performance_view stream.
Adjusted slice_duration in the connector:
I modified the following line in streams.py:

airbyte/airbyte-integrations/connectors/source-google-ads/source_google_ads/streams.py

Line 472 in c27161b
```
slice_duration = pendulum.duration(days=0)
```
This change allowed the connector to fetch the data correctly without timing out.

It seems that by setting slice_duration to 0 days, the data is retrieved in smaller, more manageable slices, avoiding the timeout problem.

Next Steps:

Would it be possible to add a configurable parameter for slice_duration in the connector? This would allow users to fine-tune the slicing behavior based on their dataset size and avoid hardcoded values.

Let me know how I can assist further!

alenoir · 2024-12-17T15:44:28Z

I tested with a 30-minute timeout, and it works — the data is fetched without issues.

Would it be possible to increase this timeout in the connector or make it configurable for users?

Let me know your thoughts!

alenoir added area/connectors Connector related issues needs-triage type/bug Something isn't working labels Dec 12, 2024

octavia-squidington-iii added autoteam team/connectors-python community labels Dec 12, 2024

marcosmarxm added connectors/source/google-ads and removed needs-triage autoteam labels Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[source-google-ads] Timeout issue with large datasets in shopping_performance_view #49254

[source-google-ads] Timeout issue with large datasets in shopping_performance_view #49254

alenoir commented Dec 12, 2024

marcosmarxm commented Dec 16, 2024

alenoir commented Dec 17, 2024

alenoir commented Dec 17, 2024

[source-google-ads] Timeout issue with large datasets in shopping_performance_view #49254

[source-google-ads] Timeout issue with large datasets in shopping_performance_view #49254

Comments

alenoir commented Dec 12, 2024

Connector Name

Connector Version

What step the error happened?

Relevant information

Additional Information:

Feature Request/Question:

Relevant log output

Contribute

marcosmarxm commented Dec 16, 2024

alenoir commented Dec 17, 2024

Next Steps:

alenoir commented Dec 17, 2024