You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
Thanks for the amazing work!
I have two dataframes, A has about 200 Million points and B has about 10 Million points, I want to find the nearest neighbor for every point in A from B, I want to do this preferably in Python, how can I achieve it using this library?
The text was updated successfully, but these errors were encountered:
Hi @hexiaoyupku ,
After looking at a lot of solutions, nothing worked, try looking at Pandas UDF and write a custom UDF for your nearest neighbour use-case, Pandas UDFs are much more performant than usual Spark UDFs, because they are vectorised and use Apache Arrow for optimised conversion between Python and JVM. They should be decent in terms of performance, if you still want further optimisation, then you can write your UDF in Scala(if you are familiar with it, or if your tech stack allows it), otherwise Pandas UDFs in PySpark should be fine.
Hi,
Thanks for the amazing work!
I have two dataframes, A has about 200 Million points and B has about 10 Million points, I want to find the nearest neighbor for every point in A from B, I want to do this preferably in Python, how can I achieve it using this library?
The text was updated successfully, but these errors were encountered: