-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory and parallelism tuning #230
Comments
(1): Pandarallel basically doubles the amount of needed memory, as stated in the documentation:
(2): No, the original data will be copied only once, whatever the parallelism. (3): There is no coordination relationship between CPU and memory (cf (2)) |
hi @nalepae , if the amount of data is quite large, how can we boost the preparation before If I have 100GB data read in memory, I have to wait a long time before the |
Pandaral·lel is looking for a maintainer! |
@SysuJayce, what do you mean by "boosting the preparation"? If you are memory-bound, I would suggest breaking up your dataframe into smaller shards and applying your function to each shard. Do you have any other problems? If not, I would like to close this issue. |
(1)It seems that memory issues cannot be solved when there is a large amount of data.
(2)If the parallelism is 20, the original data will be copied in 20 copies?
(3)How can I solve the coordination relationship between memory and CPU to set the optimal parameters,please?
The text was updated successfully, but these errors were encountered: