Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Python sdk harness failed: TypeError: can only concatenate str (not "NoneType") to str #28131

Closed
2 of 15 tasks
dummy-work-account opened this issue Aug 23, 2023 · 5 comments
Labels
bug dataflow done & done Issue has been reviewed after it was closed for verification, followups, etc. P3 python

Comments

@dummy-work-account
Copy link

What happened?

I'm trying to send custom metrics to datadog using a DoFn but the python sdk test harness is failing with an error that I don't know how to interpret:

 Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker_main.py", line 181, in main
    sdk_harness.run()
  File "/usr/local/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 256, in run
    getattr(self, SdkHarness.REQUEST_METHOD_PREFIX + request_type)(
TypeError: can only concatenate str (not "NoneType") to str

I'm running the streaming pipeline as a flex template in gcp dataflow, using the python requests module for POST calls


  • apache-beam[gcp]==2.49.0
  • gcr.io/dataflow-templates-base/python310-template-launcher-base

Issue Priority

Priority: 3 (minor)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@dummy-work-account
Copy link
Author

When I comment out the windowinto line the error goes away but the pipeline still doesn't function as expected -- which might be an issue with my custom DoFn

    with beam.Pipeline(options=options) as pipeline:
        messages = (
            pipeline
            | f"Read from input topic {subscription_id}" >>
            beam.io.ReadFromPubSub(subscription=subscription_id,
                                   with_attributes=False)
            | f"Deserialize Avro {subscription_id}" >> beam.ParDo(
                ConfluentAvroReader(schema_registry_conf)).with_outputs(
                    "record", "error"))

        records = messages["record"]
        errors = messages["error"]

        (records
         | 'Aggregate msgs in fixed window' >> beam.WindowInto(beam.window.FixedWindows(15))
         | 'Send hardcoded value to datadog' >> beam.ParDo(SendToDatadog())
         | 'Print results' >> beam.Map(print)
        )

@tvalentyn
Copy link
Contributor

The error happens in the SDK internals and is rather strange. It sounds as though a runner sends a malformed request to the SDK. I would suggest you try again and if the issue still persist provide a minimal pipeline that reproduces it, that we could try out, or work with Dataflow customer support.

@dummy-work-account
Copy link
Author

Thank you for the feedback! The issue has persisted -- I'm working on a minimal pipeline that can reproduce the error now

@tvalentyn
Copy link
Contributor

Hi, any news about the repro? Thanks!

@dummy-work-account
Copy link
Author

Thanks for the followup! I ended up going a different route, by removing the datadog module from the pipeline and setting up a separate container to relay messages from pubsub to datadog. It feels like beam/dataflow is structured around data transformation -- trying to coerce it to do external API calls seems like a dumb idea in hindsight (especially on an unpeered-vpc)

@github-actions github-actions bot added this to the 2.52.0 Release milestone Sep 29, 2023
@tvalentyn tvalentyn added done & done Issue has been reviewed after it was closed for verification, followups, etc. and removed awaiting triage labels Sep 29, 2023
@tvalentyn tvalentyn removed this from the 2.52.0 Release milestone Sep 29, 2023
tvalentyn added a commit to tvalentyn/beam that referenced this issue Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug dataflow done & done Issue has been reviewed after it was closed for verification, followups, etc. P3 python
Projects
None yet
Development

No branches or pull requests

2 participants