Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Prism runner gets stuck trying to read a file on GCS #30347

Closed
2 of 16 tasks
jlewi opened this issue Feb 19, 2024 · 2 comments
Closed
2 of 16 tasks

[Bug]: Prism runner gets stuck trying to read a file on GCS #30347

jlewi opened this issue Feb 19, 2024 · 2 comments

Comments

@jlewi
Copy link
Contributor

jlewi commented Feb 19, 2024

What happened?

There's a full reproduction here:
https://github.com/jlewi/beambugs/tree/main/prismgcs

I created a simple program which reads from TextIO and writes to TextIO. If the input is a local file this works fine. However if you use a file on GCS the prism runner appears to get stuck trying to read the input over and over again.

Here's an example message it keeps printing out

2024/02/19 15:17:19 INFO Reading from gs://<BUCKET>/hackedlogs.json source=/Users/jlewi/go/pkg/mod/github.com/apache/beam/sdks/[email protected]/go/pkg/beam/io/textio/textio.go:226 time=2024-02-19T23:17:19.971Z worker.ID=job-001[go-job-1-1708384631797458000]_go worker.endpoint=localhost:65326

This is using the GoLang SDK v2.54.0.

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@Dal-Papa
Copy link
Contributor

Dal-Papa commented Mar 4, 2024

This problem seems to have happened starting from 2.50 (the Prism rollout) and is the same all the way until 2.54 (I tried all versions down to 2.49 to see it working)

@lostluck
Copy link
Contributor

Thank you for the report!

It shouldn't matter to prism that the file is in GCS. I could see perhaps an issue when running docker workers, but prism usually runs Go SDK pipelines in loopback mode.

Most likely the additional latency to GCS is causing splits to go haywire.

We did have #29968 which reduced the split aggression, but that made it into 2.54.0, so something else is at play (or the initial latency hit is very significant).

https://github.com/apache/beam/commits/release-2.54.0/sdks/go/pkg/beam/runners/prism/internal/stage.go

This pipeline is roughly equivalent to the Wordcount example ( beam/sdks/go/examples/wordcount/wordcount.go) which does work.

But the repro pipeline instead has an anonymous function as a DoFn. Those don't work on portable runners, and are not meaningfully supportable. If the original tests were on the Go Direct Runner (which would be the case in 2.54.0), then they'd have worked, but only because the Direct runner doesn't serialize anything.

I suspect that's the root cause of the issue in this case, and this is validated once I move the inlined function to a registered, named function call like the following snippet does:

func updateLine(line string, emit func(string)) {
	output := fmt.Sprintf("line had length %d", len(line))
		fmt.Print(output + "\n")
		emit(output)
	}, all)
}

func init() {
  register.Function2x0(updateLine)
}

func main() {
   ...
   spans := beam.ParDo(s, updateLine, all)
   ...

Unfortunately for us, it's not presently possible in Go to detect if a function is an anonymous function or not, outside of doing static analysis on the source code. Were I to redesign the SDK, I wouldn't permit simple functions like that as DoFn, as they cause more issues than they solve, or I'd provide a more closure/anonymous safe way of building pipelines to enable them, and avoid registrations entirely.

@lostluck lostluck closed this as not planned Won't fix, can't repro, duplicate, stale Jul 24, 2024
@github-actions github-actions bot added this to the 2.59.0 Release milestone Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants