-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Python JDBC IO Try To Connect RDB Before Deploying #23029
Comments
Could you please share the error message seen when deploying the pipeline to Dataflow? I did some local test and see the following error when cannot connect to jdbc database:
If this is also what you see, what happens is that the external transform is trying to infer schema by connecting to the database at pipeline expansion time, which happens only in external transform expansion service. Will investigate whether it is possible or how can avoid it. |
.remove-labels "awaiting triage" |
Label "awaiting cannot be managed because it does not exist in the repo. Please check your spelling. |
.remove-labels 'awaiting triage' |
Expansion service tries to get the schema by connecting the jdbc server. Using Java SDK does not go to the expansion service so it did not. However I agree that it is reasonable that could defer the process. CC: @robertwb There was some discussion of defer the expansion service. It could benefit this use case if implemented. Or is there other solution? |
I'm facing the same issue where it tries to infer schema during pipeline submission from local machine (which doesn't have access to DB server). |
Hi @case-k-git any updates ? did you find a solution ? |
Hi Matar, |
hey @RhysGrimshaw As of now, the only option is you must open the connection between the machine submitting (you local machine if you submitting manually or your Dataflow VMs subnet)the job and DB server. For example, in my case I opened the connection from our dataflow subnet to the DB server. |
Thank you for getting back to me. How do you go about running the a Python script directly from the Subnet? The only options I seem to have available are from a template/builder, or by reusing a Dataflow Job. But since my job fails before it gets to Dataflow (due to it trying to infer schema from a local machine connection) I can't build a new job from this. |
@RhysGrimshaw for testing purpose, I opened the connection from my local machine to DB server. Production wise, we submit our jobs automatically via cloud schudeler & cloud composer. So the point to remember is your dataflow vm needs to be able to access the DB server for initial schema inference |
What happened?
When I tried to deploy python jdbc pipeline to dataflow from my local env, failed to deploy into dataflow and got connection error. seems to be python jdbc io trying to connect database from local env not only dataflow env.
I have checked connection and find trying to make connection from my pc.database can only accepting connection inside from dataflow net work so got connection error.
I have also checked java jdbc version and it worked fine. so python versions this behavior must be bug
Issue Priority
Priority: 2
Issue Component
Component: cross-language
The text was updated successfully, but these errors were encountered: