-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request]: OpenTelemetry Support #33176
Comments
This is a good idea. We do have a plan to investigate this idea. cc @Abacn |
@liferoad @Abacn I was able to do what I want by manually creating a wrapper on the class AutoSpanDoFn(beam.DoFn):
def __init__(self, transform: beam.PTransform[InputT, OutputT]):
super().__init__()
self.transform = transform
def process(self, element: Any, *args, **kwargs):
label = self.transform.label if hasattr(self.transform, "label") else "UnnamedTransform"
with logfire.span(label):
yield element
class AutoSpanTransform(beam.PTransform[InputT, OutputT]):
def __init__(self, transform: beam.PTransform[InputT, OutputT]):
super().__init__()
self.transform = transform
def expand(self, input_or_inputs: InputT) -> OutputT:
return input_or_inputs | beam.ParDo(AutoSpanDoFn(self.transform)) | self.transform And then, I can use it like this: with logfire.span("main"):
with Pipeline() as pipeline:
text = [
"To be, or not to be: that is the question: ",
"Whether 'tis nobler in the mind to suffer ",
"The slings and arrows of outrageous fortune, ",
"Or to take arms against a sea of troubles, ",
]
pipeline = (
pipeline
| "Create" >> beam.Create(text)
| "Split" >> AutoSpanTransform(beam.ParDo(Split()))
| "Filter" >> AutoSpanTransform(beam.Filter(lambda x: x != "the"))
| "Print" >> AutoSpanTransform(beam.Map(logfire_print))
) You can see the full code here: https://github.com/Kludex/logfire-apache-beam/blob/main/main.py You can run it with This is what I see in Logfire (the observability platform we are developing): Now... I need to be able to do it automatically. I tried to do some patching, like this: _original_pipeline_apply = Pipeline.apply
def patched_pipeline_apply(self, transform, *args, **kwargs):
if (
isinstance(transform, beam.PTransform)
and not isinstance(transform, AutoSpanTransform)
and not getattr(transform, "_instrumented", False)
):
transform = AutoSpanTransform(transform) # AutoSpanTrasnform has a `_instrumented = True` in this code.
return _original_pipeline_apply(self, transform, *args, **kwargs)
Pipeline.apply = patched_pipeline_apply But I keep getting recursive exception - I'm still debugging it. But... I would appreciate help in two things:
|
This looks interesting. I am wondering how this can work with remote Runners like Dataflow or Flink. |
cc @Abacn PTAL. |
BTW, I checked in Dataflow. I made it work, but not as smooth as I wanted - I can provide code in the next days... Mainly because the context, and the exporter configured don't survive the pickling. |
Is it possible to redirect regular logs to logfire? |
No, only OpenTelemetry data. |
What would you like to happen?
This is more a question than a feature request.
I was wondering... Why I don't see any OpenTelemetry related issue on Apache Beam? Is it because the runners already provide observability about their jobs?
I'm more interested in the Python side for now, but if there's no reason, would it make sense to create a
opentelemetry-instrumentation-apache-beam
package?Issue Priority
Priority: 3 (nice-to-have improvement)
Issue Components
The text was updated successfully, but these errors were encountered: