Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to read a parquet file from multiple directories at once #96

Open
V-yg opened this issue Jul 20, 2020 · 5 comments
Open

How to read a parquet file from multiple directories at once #96

V-yg opened this issue Jul 20, 2020 · 5 comments

Comments

@V-yg
Copy link

V-yg commented Jul 20, 2020

Hi,I want to read a parquet file in multiple directories at once, but I don't see the interface. Do I need to do this myself? Or is there an alternative

@sadikovi
Copy link
Member

I am not quite sure I understand what you mean. Can you do the same with just spark.read.parquet?

@V-yg
Copy link
Author

V-yg commented Jul 22, 2020

like this: spark.index.create.indexByAll().parquet(path1,path2,path3)

@V-yg
Copy link
Author

V-yg commented Jul 22, 2020

One of my Parquet tables has multiple paths, but not a partition type. Now to create index for a table, I need to pass in all the paths,I've already implemented this,thanks

@sadikovi
Copy link
Member

I don't quite understand your directory structure. Usually, a parquet table is represented as a directory with files or partition sub-directories. If there is some sort of parent directory for all of those paths, you can try providing that instead. Right now, it is not possible to pass multiple paths, maybe it makes sense to add this functionality.

@V-yg
Copy link
Author

V-yg commented Jul 22, 2020

Ok, I have understood. Thank you for your reply

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants