forked from apache/beam
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sync with open source how #118
Draft
lesterhaynes
wants to merge
5,604
commits into
linkedin:li_trunk
Choose a base branch
from
apache:master
base: li_trunk
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…3089) * Change dead partition detection to only look at the current topic, instead of looking at all topics * spotless * fix test, simplify existence check
* adding disableAutoCommit flag to ReadFn --------- Co-authored-by: Chris Ashcraft <[email protected]>
) Bumps [cloud.google.com/go/bigquery](https://github.com/googleapis/google-cloud-go) from 1.63.1 to 1.64.0. - [Release notes](https://github.com/googleapis/google-cloud-go/releases) - [Changelog](https://github.com/googleapis/google-cloud-go/blob/main/CHANGES.md) - [Commits](googleapis/google-cloud-go@bigquery/v1.63.1...spanner/v1.64.0) --- updated-dependencies: - dependency-name: cloud.google.com/go/bigquery dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Lineage support for JdbcIO * Report table the pipeline read from and write to * add logs and documentation
…eadFromKafkaDoFn (#32921) * Revert "Set backlog in gauge metric (#31137)" * Revert "Add Backlog Metrics to Kafka Splittable DoFn Implementation (#31281)" This reverts commit fd4368f. * Call reportBacklog in nextBatch to report split metrics more often * Report SDF metrics for the active split/partition after processing a record batch * Use KafkaSourceDescriptor as cache key and log entry
* This is a follow-up PR to #31953, and part of the issue #31905. This PR adds the actual writer functionality, and some additional testing, including integration testing. This should be final PR for the SolaceIO write connector to be complete. * Use static imports for Preconditions * Remove unused method * Logging has builtin formatting support * Use TypeDescriptors to check the type used as input * Fix parameter name * Use interface + utils class for MessageProducer * Use null instead of optional * Avoid using ByteString just to create an empty byte array. * Fix documentation, we are not using ByteString now. * Not needed anymore, we are not using ByteString * Defer transforming latency from nanos to millis. The transform into millis is done at the presentation moment, when the metric is reported to Beam. * Avoid using top level classes with a single inner class. A couple of DoFns are moved to their own files too, as the abstract class forthe UnboundedSolaceWriter was in practice a "package". This commits addresses a few comments about the structure of UnboundedSolaceWriter and some base classes of that abstract class. * Remove using a state variable, there is already a timer. This DoFn is a stateful DoFn to force a shuffling with a given input key set cardinality. * Properties must always be set. The warnings are only shown if the user decided to set the properties that are overriden by the connector. This was changed in one of the previous commits but it is actually a bug. I am reverting that change and changing this to a switch block, to make it more clear that the properties need to be set always by the connector. * Add a new custom mode so no JCSMP property is overridden. This lets the user to fully control all the properties used by the connector, instead of making sensible choices on its behalf. This also adds some logging to be more explicit about what the connector is doing. This does not add too much logging pressure, this only adds logging at the producer creation moment. * Add some more documentation about the new custom submission mode. * Fix bug introduced with the refactoring of code for this PR. I forgot to pass the submission mode when the write session is created, and I called the wrong method in the base class because it was defined as public. This makes sure that the submission mode is passed to the session when the session is created for writing messages. * Remove unnecessary Serializable annotation. * Make the PublishResult class for handling callbacks non-static to handle pipelines with multiple write transforms. * Rename maxNumOfUsedWorkers to numShards * Use RoundRobin assignment of producers to process bundles. * Output results in a GlobalWindow * Add ErrorHandler * Fix docs * Remove PublishResultHandler class that was just a wrapper around a Queue * small refactors * Revert CsvIO docs fix * Add withErrorHandler docs * fix var scope --------- Co-authored-by: Bartosz Zablocki <[email protected]>
* managed bigqueryio * spotless * move managed dependency to test only * cleanup after merging snake_case PR * choose write method based on boundedness and pipeline options * rename bigquery write config class * spotless * change read output tag to 'output' * spotless * revert logic that depends on DataflowServiceOptions. switching BQ methods can instead be done in Dataflow service side * spotless * fix typo * separate BQ write config to a new class * fix doc * resolve after syncing to HEAD * spotless * fork on batch/streaming * cleanup * spotless * portable bigquery destinations * move forking logic to BQ schematransform side * add file loads translation and tests; add test checks that the correct transform is chosen * set top-level wrapper to be the underlying managed BQ transform urn; change tests to verify underlying transform name * move unit tests to respectvie schematransform test classes * expose to Python SDK as well * cleanup * address comment * set enable_streaming_engine option; add to CHANGES
Co-authored-by: Claude <[email protected]>
Beam Yaml's error handling framework returns per-record errors as a schema'd PCollection with associated error metadata (e.g. error messages, tracebacks). Currently there is no way to "unnest" the nested rececords (except for field by field) back to the top level if one wants to re-process these records (or otherwise ignore the metadata). Even if there was a way to do this "up-one-level" unnesting it's not clear that this would be obvious to users to find. Worse, various forms of error handling are not consistent in what the "bad records" schema is, or even where the original record is found (though we do have a caveat in the docs that this is still not set in stone). This adds a simple, easy to identify transform that abstracts all of these complexities away for the basic usecase.
* set enable_streaming_engine option * trigger test * trigger test * revert test trigger
Update dataframes to PEP 585 typing
* Make AvroUtils compatible with older versions of Avro * Create beam_PostCommit_Java_Avro_Versions.json * Update AvroUtils.java * Fix nullness
* fix JDBC providers Signed-off-by: Jeffrey Kinard <[email protected]> * fix test failures Signed-off-by: Jeffrey Kinard <[email protected]> * fix typo Signed-off-by: Jeffrey Kinard <[email protected]> --------- Signed-off-by: Jeffrey Kinard <[email protected]>
Co-authored-by: Naireen <[email protected]>
Update container image to pick up recent changes.
…er and runner APIs.
* Add TF MNIST classification cost benchmark * linting * Generalize to single workflow file for cost benchmarks * fix incorrect UTC time in comment * move wordcount to same workflow * update workflow job name
…33402) Committing consumer offsets to Kafka is not critical for KafkaIO because it relies on the offsets stored in KafkaCheckpointMark, but throwing an exception makes Dataflow retry the same work item unnecessarily.
* Add missing to_type_hint to WindowedValueCoder * Add type ignore to make mypy happy.
… releases from snapshot repo
* [YAML] Better docs for Filter and MapToFields. * Remove redundant optional indicators. * Update sdks/python/apache_beam/yaml/yaml_mapping.py Co-authored-by: Jeff Kinard <[email protected]> --------- Co-authored-by: Jeff Kinard <[email protected]>
* Fix env variable loading in Cost Benchmark workflow * fix output file for tf mnist * add load test requirements file arg * update mnist args * revert how args are passed * assign result correctly
Revert three commits related to supporting custom coder in reshuffle
* improve python multi-lang examples * minor adjustments
Bumps [github.com/docker/docker](https://github.com/docker/docker) from 27.3.1+incompatible to 27.4.1+incompatible. - [Release notes](https://github.com/docker/docker/releases) - [Commits](moby/moby@v27.3.1...v27.4.1) --- updated-dependencies: - dependency-name: github.com/docker/docker dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [google.golang.org/api](https://github.com/googleapis/google-api-go-client) from 0.212.0 to 0.214.0. - [Release notes](https://github.com/googleapis/google-api-go-client/releases) - [Changelog](https://github.com/googleapis/google-api-go-client/blob/main/CHANGES.md) - [Commits](googleapis/google-api-go-client@v0.212.0...v0.214.0) --- updated-dependencies: - dependency-name: google.golang.org/api dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Please add a meaningful description for your change here
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.