Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POM files with direct reference to maven repositories #21096

Closed
damccorm opened this issue Jun 4, 2022 · 2 comments
Closed

POM files with direct reference to maven repositories #21096

damccorm opened this issue Jun 4, 2022 · 2 comments
Labels
bug build dependencies Pull requests that update a dependency file io java kafka P3

Comments

@damccorm
Copy link
Contributor

damccorm commented Jun 4, 2022

Issue: 
I'm working on a company that only allow private maven repositories ( like nexus internal proxies ) .
I'm starting to see a public repository been called on my builds 



 [ERROR] Failed to execute goal on project pipeline-app: Could not resolve dependencies for project
com.equifax.dfds.platform:pipeline-app:jar:20210618.2.ae73898: Failed to collect dependencies at org.apache.beam:beam-runners-google-cloud-dataflow-java:jar:2.30.0
-> org.apache.beam:beam-sdks-java-io-kafka:jar:2.30.0 -> io.confluent:kafka-avro-serializer:jar:5.3.2:
Failed to read artifact descriptor for io.confluent:kafka-avro-serializer:jar:5.3.2: Could not transfer
artifact io.confluent:kafka-avro-serializer:pom:5.3.2 from/to io.confluent (https://packages.confluent.io/maven/):
Connection reset -> [Help 1]
 org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute
goal on project pipeline-app: Could not resolve dependencies for project com.equifax.dfds.platform:pipeline-app:jar:20210618.2.ae73898:
Failed to collect dependencies at org.apache.beam:beam-runners-google-cloud-dataflow-java:jar:2.30.0
-> org.apache.beam:beam-sdks-java-io-kafka:jar:2.30.0 -> io.confluent:kafka-avro-serializer:jar:5.3.2


!image-2021-06-18-14-36-36-437.png!
!image-2021-06-18-14-36-45-842.png!

Has part of the ticket, a repository has been added directly to the generated pom. 
This repository should be configured by users that want to compile the code... not pre-configured/ forced on the pom file. 

https://issues.apache.org/jira/browse/BEAM-9292

Now exit this element on the pom 


<repositories>
  <repository>
    <id>io.confluent</id>
    <url>https://packages.confluent.io/maven/</url>

 </repository>
</repositories>

https://repo1.maven.org/maven2/org/apache/beam/beam-sdks-java-io-kafka/2.30.0/beam-sdks-java-io-kafka-2.30.0.pom
VS 
https://repo1.maven.org/maven2/org/apache/beam/beam-sdks-java-io-kafka/2.19.0/beam-sdks-java-io-kafka-2.19.0.pom

 

dependencies added here 
!image-2021-06-18-14-55-41-740.png!

f2cc926

Imported from Jira BEAM-12510. Original Jira may contain additional context.
Reported by: abelmatos.

@damccorm damccorm added bug build-system dependencies Pull requests that update a dependency file P3 labels Jun 4, 2022
@zezutom
Copy link

zezutom commented Aug 26, 2022

Hi, I am facing the exact same problem. Our company only allows for dependencies from maven central.

What makes matters worse is that excluding the dependencies renders the Kafka SDK unusable:

 implementation("org.apache.beam:beam-sdks-java-io-kafka:2.40.0") {
      // These artifacts are not in maven central! The sdk cannot be used without them :(
      exclude group: "io.confluent", module: "kafka-avro-serializer"
      exclude group: "io.confluent", module: "kafka-schema-registry-client"
   }

Because the Kafka SDK is unusable without the Confluent dependencies you also have to exclude it from the Dataflow runner:

   runtimeOnly("org.apache.beam:beam-runners-google-cloud-dataflow-java:$beamVersion") {
        exclude module: "beam-sdks-java-io-kafka" 
   }

This is not a problem when running batch pipelines in Dataflow. However, the moment you try running a streaming pipeline, it will fail with:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/beam/sdk/io/kafka/KafkaIO$Read

I saw this exception when trying to run a streaming pipeline attached to PubSub, e.g. my pipeline had absolutely nothing to do with Kafka. This must be a bug in the Dataflow runner itself. The problem is discussed in this email thread.

@Abacn
Copy link
Contributor

Abacn commented Feb 13, 2024

This is resolved by #30300. Tested that Dataflow v1 streaming pipeline no longer have io-kafka and nor io.confluent dependencies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug build dependencies Pull requests that update a dependency file io java kafka P3
Projects
None yet
Development

No branches or pull requests

3 participants