You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here's a template for a GitHub issue requesting the addition of a resume functionality in pandarallel:
Feature Request: Add Resume Functionality
Feature Missing:
I would like to request the addition of a resume functionality in pandarallel. Currently, if a process using pandarallel is interrupted or fails for any reason, there is no built-in way to resume the computation from where it left off. This feature would be highly beneficial for long-running computations, allowing users to avoid reprocessing unchanged data.
Use Case:
When processing large datasets, interruptions can occur due to various reasons, such as system crashes or timeout errors. If users could resume from the last completed state, it would save time and resources, making pandarallel even more efficient and user-friendly.
Proposed Solution:
Implement a mechanism to periodically save the state of the computation (e.g., the progress and results of processed rows).
Provide an optional argument in pandarallel functions that would allow users to specify whether they want to use the resume functionality.
Introduce functions to load the saved state and continue processing from the last checkpoint.
Example:
An example implementation could look like this:
frompandarallelimportpandarallel# Initialize pandarallelpandarallel.initialize()
# Function to process data (with state saving)defprocess_data_with_resume(df):
# Check for existing progresscompleted=load_progress() # Function to load existing progressremaining=df[~df['id'].isin(completed['id'])]
ifnotremaining.empty:
results=remaining.parallel_apply(process_function, axis=1)
save_progress(results) # Function to save progress# Provide an option to load from the last checkpointprocess_data_with_resume(df)
By adding this capability, pandarallel would greatly enhance its robustness in scenarios involving long computations, thus improving the user experience.
The text was updated successfully, but these errors were encountered:
Here's a template for a GitHub issue requesting the addition of a resume functionality in
pandarallel
:Feature Request: Add Resume Functionality
Feature Missing:
I would like to request the addition of a resume functionality in
pandarallel
. Currently, if a process usingpandarallel
is interrupted or fails for any reason, there is no built-in way to resume the computation from where it left off. This feature would be highly beneficial for long-running computations, allowing users to avoid reprocessing unchanged data.Use Case:
When processing large datasets, interruptions can occur due to various reasons, such as system crashes or timeout errors. If users could resume from the last completed state, it would save time and resources, making
pandarallel
even more efficient and user-friendly.Proposed Solution:
pandarallel
functions that would allow users to specify whether they want to use the resume functionality.Example:
An example implementation could look like this:
By adding this capability,
pandarallel
would greatly enhance its robustness in scenarios involving long computations, thus improving the user experience.The text was updated successfully, but these errors were encountered: