Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[grid] rework the retry of http requests #14917 #14924

Merged
merged 3 commits into from
Dec 30, 2024
Merged

[grid] rework the retry of http requests #14917 #14924

merged 3 commits into from
Dec 30, 2024

Conversation

joerg1985
Copy link
Member

@joerg1985 joerg1985 commented Dec 20, 2024

User description

Description

Remove the use of dev.failsafe inside the RetryRequest with a simple loop.

There is also a functional change, Exceptions and error responses are delivered unmodified to the caller.
This will allow the caller to handle the original Exception / error response, so we do not loose data here.
The old mapping did also not return a valid W3C response, so the client should not have been able to decode it.

Motivation and Context

see #14917

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • I have read the contributing document.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

PR Type

Bug fix, Enhancement


Description

  • Replaced the dev.failsafe library with a custom retry implementation using a simple loop mechanism
  • Changed error handling behavior to preserve original exceptions and error responses instead of wrapping them
  • Implemented configurable retry attempts:
    • Up to 3 retries for connection exceptions
    • Up to 2 retries for server errors
  • Added comprehensive test coverage for the new implementation
  • Removed external dependency on dev.failsafe library
  • Improved logging of retry attempts with detailed error information

Changes walkthrough 📝

Relevant files
Enhancement
RetryRequest.java
Refactor HTTP request retry mechanism with custom implementation

java/src/org/openqa/selenium/remote/http/RetryRequest.java

  • Replaced dev.failsafe library with a simple retry loop implementation
  • Modified error handling to return original exceptions and error
    responses
  • Implemented configurable retry attempts for connection and server
    errors
  • Added detailed logging for retry attempts
  • +34/-69 
    Tests
    RetryRequestTest.java
    Expand test coverage for RetryRequest implementation         

    java/test/org/openqa/selenium/remote/http/RetryRequestTest.java

  • Added new test cases for unexpected exceptions and retry behavior
  • Enhanced multi-threaded test with increased iterations
  • Added tests for connection failures and server error handling
  • Updated assertions to match new error handling behavior
  • +90/-12 
    Dependencies
    BUILD.bazel
    Remove failsafe dependency from build configuration           

    java/src/org/openqa/selenium/remote/http/BUILD.bazel

    • Removed dependency on dev.failsafe library
    +0/-2     

    💡 PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    Copy link
    Contributor

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
    🧪 PR contains tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Error Handling
    The new implementation throws IllegalStateException with "unreachable code" message, which could be confusing. Consider handling the case when all retries are exhausted more explicitly.

    Code Duplication
    The retry count variables and maxAttempts calculation could be moved to class constants to improve maintainability and avoid magic numbers.

    Error Logging
    Consider adding more detailed error information in logs, such as the actual error message from the server response, not just the status code.

    Copy link
    Contributor

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Score
    Possible issue
    Handle the final attempt properly instead of throwing an unreachable exception

    The throw statement in the final line is unreachable and incorrect. Instead, execute
    one final attempt and return its response when max retries are exhausted.

    java/src/org/openqa/selenium/remote/http/RetryRequest.java [66-67]

    -throw new IllegalStateException(
    -    "Effective unreachable code reached, please raise a bug ticket.");
    +return next.execute(req); // Final attempt after retries are exhausted
    • Apply this suggestion
    Suggestion importance[1-10]: 9

    Why: The current code throws an unreachable exception which is incorrect. The suggestion fixes a logical error by properly handling the final attempt, making the retry mechanism work as intended.

    9
    General
    Implement exponential backoff between retry attempts to improve reliability

    Add exponential backoff between retry attempts to prevent overwhelming the server
    and improve success rate of subsequent attempts.

    java/src/org/openqa/selenium/remote/http/RetryRequest.java [58-61]

     if (isServerError && i < retriesOnServerError) {
       LOG.log(LOG_LEVEL, "Retry #" + (i + 1) + " on ServerError: " + response.getStatus());
    +  Thread.sleep((long) Math.pow(2, i) * 1000); // Exponential backoff
       continue;
     }
    • Apply this suggestion
    Suggestion importance[1-10]: 7

    Why: Adding exponential backoff is a good practice that can significantly improve the success rate of retries by preventing server overload, though the current implementation works without it.

    7
    Make retry configuration externally configurable instead of using hard-coded values

    Make retry attempt counts configurable through constructor parameters instead of
    hardcoding them.

    java/src/org/openqa/selenium/remote/http/RetryRequest.java [36-37]

    -int retriesOnConnectException = 3;
    -int retriesOnServerError = 2;
    +private final int retriesOnConnectException;
    +private final int retriesOnServerError;
     
    +public RetryRequest(int retriesOnConnectException, int retriesOnServerError) {
    +  this.retriesOnConnectException = retriesOnConnectException;
    +  this.retriesOnServerError = retriesOnServerError;
    +}
    +
    • Apply this suggestion
    Suggestion importance[1-10]: 6

    Why: Making retry attempts configurable would improve flexibility and reusability of the code, allowing different retry strategies for different scenarios.

    6

    Copy link
    Contributor

    qodo-merge-pro bot commented Dec 20, 2024

    CI Failure Feedback 🧐

    (Checks updated until commit 47fc765)

    Action: Ruby / Remote Tests (edge, windows) / Remote Tests (edge, windows)

    Failed stage: [❌]

    Failure summary:

    The build was interrupted and failed due to:

  • The build process was cancelled (Ctrl+C handler triggered)
  • Build took an excessive amount of time (21533 seconds / ~6 hours)
  • All 31 tests were skipped due to the build interruption
  • The process got stuck while extracting npm package @apollo/[email protected]_1027978167 for over 21000
    seconds

  • Relevant error logs:
    1:  ##[group]Operating System
    2:  Microsoft Windows Server 2022
    ...
    
    636:  external/protobuf~/third_party/utf8_range/utf8_range.c(178): warning C4141: 'inline': used more than once
    637:  cl : Command line warning D9002 : ignoring unknown option '-std=c++14'
    638:  �[32mINFO: �[0mFrom Compiling absl/types/bad_optional_access.cc [for tool]:
    639:  cl : Command line warning D9002 : ignoring unknown option '-std=c++14'
    640:  �[32mINFO: �[0mFrom Compiling absl/log/internal/nullguard.cc [for tool]:
    641:  cl : Command line warning D9002 : ignoring unknown option '-std=c++14'
    642:  �[32mINFO: �[0mFrom Compiling absl/time/internal/cctz/src/civil_time_detail.cc [for tool]:
    643:  cl : Command line warning D9002 : ignoring unknown option '-std=c++14'
    644:  �[32mINFO: �[0mFrom Compiling absl/base/internal/strerror.cc [for tool]:
    ...
    
    1630:  �[32m[2,184 / 2,338]�[0m Extracting npm package @apollo/[email protected]_1027978167; 42s local, disk-cache ... (4 actions, 3 running)
    1631:  �[32m[2,187 / 2,338]�[0m Extracting npm package @apollo/[email protected]_1027978167; 43s local, disk-cache ... (4 actions, 3 running)
    1632:  �[32m[2,188 / 2,338]�[0m Extracting npm package @apollo/[email protected]_1027978167; 45s local, disk-cache ... (4 actions, 2 running)
    1633:  �[32m[2,188 / 2,338]�[0m Extracting npm package @apollo/[email protected]_1027978167; 47s local, disk-cache ... (4 actions, 3 running)
    1634:  �[32m[2,189 / 2,338]�[0m Extracting npm package @apollo/[email protected]_1027978167; 48s local, disk-cache ... (4 actions, 3 running)
    1635:  �[32m[2,190 / 2,338]�[0m Extracting npm package @apollo/[email protected]_1027978167; 50s local, disk-cache ... (4 actions, 2 running)
    1636:  �[32m[2,193 / 2,338]�[0m Extracting npm package @apollo/[email protected]_1027978167; 52s local, disk-cache ... (4 actions, 3 running)
    1637:  �[32mINFO: �[0mFrom Building java/src/org/openqa/selenium/remote/libapi-class.jar (71 source files):
    1638:  java\src\org\openqa\selenium\remote\ErrorHandler.java:46: warning: [removal] ErrorCodes in org.openqa.selenium.remote has been deprecated and marked for removal
    1639:  private final ErrorCodes errorCodes;
    1640:  ^
    1641:  java\src\org\openqa\selenium\remote\ErrorHandler.java:60: warning: [removal] ErrorCodes in org.openqa.selenium.remote has been deprecated and marked for removal
    1642:  this.errorCodes = new ErrorCodes();
    1643:  ^
    1644:  java\src\org\openqa\selenium\remote\ErrorHandler.java:68: warning: [removal] ErrorCodes in org.openqa.selenium.remote has been deprecated and marked for removal
    1645:  public ErrorHandler(ErrorCodes codes, boolean includeServerErrors) {
    1646:  ^
    1647:  java\src\org\openqa\selenium\remote\Response.java:97: warning: [removal] ErrorCodes in org.openqa.selenium.remote has been deprecated and marked for removal
    1648:  ErrorCodes errorCodes = new ErrorCodes();
    1649:  ^
    1650:  java\src\org\openqa\selenium\remote\Response.java:97: warning: [removal] ErrorCodes in org.openqa.selenium.remote has been deprecated and marked for removal
    1651:  ErrorCodes errorCodes = new ErrorCodes();
    1652:  ^
    1653:  java\src\org\openqa\selenium\remote\ProtocolHandshake.java:181: warning: [removal] ErrorCodes in org.openqa.selenium.remote has been deprecated and marked for removal
    1654:  response.setStatus(ErrorCodes.SUCCESS);
    1655:  ^
    1656:  java\src\org\openqa\selenium\remote\ProtocolHandshake.java:182: warning: [removal] ErrorCodes in org.openqa.selenium.remote has been deprecated and marked for removal
    1657:  response.setState(ErrorCodes.SUCCESS_STRING);
    1658:  ^
    1659:  java\src\org\openqa\selenium\remote\W3CHandshakeResponse.java:53: warning: [removal] ErrorCodes in org.openqa.selenium.remote has been deprecated and marked for removal
    1660:  new ErrorCodes().toStatus((String) rawError, Optional.of(tuple.getStatusCode())));
    1661:  ^
    1662:  java\src\org\openqa\selenium\remote\W3CHandshakeResponse.java:56: warning: [removal] ErrorCodes in org.openqa.selenium.remote has been deprecated and marked for removal
    1663:  new ErrorCodes().getExceptionType((String) rawError);
    1664:  ^
    1665:  java\src\org\openqa\selenium\remote\codec\AbstractHttpResponseCodec.java:44: warning: [removal] ErrorCodes in org.openqa.selenium.remote has been deprecated and marked for removal
    1666:  private final ErrorCodes errorCodes = new ErrorCodes();
    1667:  ^
    1668:  java\src\org\openqa\selenium\remote\codec\AbstractHttpResponseCodec.java:44: warning: [removal] ErrorCodes in org.openqa.selenium.remote has been deprecated and marked for removal
    1669:  private final ErrorCodes errorCodes = new ErrorCodes();
    1670:  ^
    1671:  java\src\org\openqa\selenium\remote\codec\AbstractHttpResponseCodec.java:55: warning: [removal] ErrorCodes in org.openqa.selenium.remote has been deprecated and marked for removal
    1672:  int status = response.getStatus() == ErrorCodes.SUCCESS ? HTTP_OK : HTTP_INTERNAL_ERROR;
    1673:  ^
    1674:  java\src\org\openqa\selenium\remote\codec\AbstractHttpResponseCodec.java:101: warning: [removal] ErrorCodes in org.openqa.selenium.remote has been deprecated and marked for removal
    1675:  response.setStatus(ErrorCodes.UNKNOWN_COMMAND);
    1676:  ^
    1677:  java\src\org\openqa\selenium\remote\codec\AbstractHttpResponseCodec.java:103: warning: [removal] ErrorCodes in org.openqa.selenium.remote has been deprecated and marked for removal
    1678:  response.setStatus(ErrorCodes.UNHANDLED_ERROR);
    1679:  ^
    1680:  java\src\org\openqa\selenium\remote\codec\AbstractHttpResponseCodec.java:117: warning: [removal] ErrorCodes in org.openqa.selenium.remote has been deprecated and marked for removal
    1681:  response.setStatus(ErrorCodes.SUCCESS);
    1682:  ^
    1683:  java\src\org\openqa\selenium\remote\codec\AbstractHttpResponseCodec.java:118: warning: [removal] ErrorCodes in org.openqa.selenium.remote has been deprecated and marked for removal
    1684:  response.setState(errorCodes.toState(ErrorCodes.SUCCESS));
    1685:  ^
    1686:  java\src\org\openqa\selenium\remote\codec\AbstractHttpResponseCodec.java:124: warning: [removal] ErrorCodes in org.openqa.selenium.remote has been deprecated and marked for removal
    1687:  response.setState(errorCodes.toState(ErrorCodes.SUCCESS));
    1688:  ^
    1689:  java\src\org\openqa\selenium\remote\codec\w3c\W3CHttpResponseCodec.java:70: warning: [removal] ErrorCodes in org.openqa.selenium.remote has been deprecated and marked for removal
    1690:  private final ErrorCodes errorCodes = new ErrorCodes();
    1691:  ^
    1692:  java\src\org\openqa\selenium\remote\codec\w3c\W3CHttpResponseCodec.java:70: warning: [removal] ErrorCodes in org.openqa.selenium.remote has been deprecated and marked for removal
    1693:  private final ErrorCodes errorCodes = new ErrorCodes();
    1694:  ^
    1695:  java\src\org\openqa\selenium\remote\codec\w3c\W3CHttpResponseCodec.java:93: warning: [removal] ErrorCodes in org.openqa.selenium.remote has been deprecated and marked for removal
    1696:  response.setStatus(ErrorCodes.UNKNOWN_COMMAND);
    1697:  ^
    1698:  java\src\org\openqa\selenium\remote\codec\w3c\W3CHttpResponseCodec.java:98: warning: [removal] ErrorCodes in org.openqa.selenium.remote has been deprecated and marked for removal
    1699:  response.setStatus(ErrorCodes.UNHANDLED_ERROR);
    1700:  ^
    1701:  java\src\org\openqa\selenium\remote\codec\w3c\W3CHttpResponseCodec.java:145: warning: [removal] ErrorCodes in org.openqa.selenium.remote has been deprecated and marked for removal
    1702:  response.setStatus(ErrorCodes.SUCCESS);
    ...
    
    2200:  �[32m[2,330 / 2,338]�[0m Extracting npm package @apollo/[email protected]_1027978167; 21053s local, disk-cache
    2201:  �[32m[2,330 / 2,338]�[0m Extracting npm package @apollo/[email protected]_1027978167; 21113s local, disk-cache
    2202:  �[32m[2,330 / 2,338]�[0m Extracting npm package @apollo/[email protected]_1027978167; 21173s local, disk-cache
    2203:  �[32m[2,330 / 2,338]�[0m Extracting npm package @apollo/[email protected]_1027978167; 21233s local, disk-cache
    2204:  �[32m[2,330 / 2,338]�[0m Extracting npm package @apollo/[email protected]_1027978167; 21294s local, disk-cache
    2205:  �[32m[2,330 / 2,338]�[0m Extracting npm package @apollo/[email protected]_1027978167; 21354s local, disk-cache
    2206:  Bazel Ctrl+C handler; shutting down.
    2207:  �[32m[2,331 / 2,338]�[0m checking cached actions
    2208:  �[31m�[1mERROR: �[0mbuild interrupted
    2209:  �[32mINFO: �[0mElapsed time: 21533.190s, Critical Path: 21395.10s
    2210:  �[32mINFO: �[0m2070 processes: 845 disk cache hit, 866 internal, 275 local, 84 worker.
    2211:  �[31m�[1mERROR: �[0mBuild did NOT complete successfully
    2212:  Executed 0 out of 31 tests: �[0m�[35m31 were skipped�[0m.
    2213:  �[0m
    2214:  ##[error]The operation was canceled.
    

    ✨ CI feedback usage guide:

    The CI feedback tool (/checks) automatically triggers when a PR has a failed check.
    The tool analyzes the failed checks and provides several feedbacks:

    • Failed stage
    • Failed test name
    • Failure summary
    • Relevant error logs

    In addition to being automatically triggered, the tool can also be invoked manually by commenting on a PR:

    /checks "https://github.com/{repo_name}/actions/runs/{run_number}/job/{job_number}"
    

    where {repo_name} is the name of the repository, {run_number} is the run number of the failed check, and {job_number} is the job number of the failed check.

    Configuration options

    • enable_auto_checks_feedback - if set to true, the tool will automatically provide feedback when a check is failed. Default is true.
    • excluded_checks_list - a list of checks to exclude from the feedback, for example: ["check1", "check2"]. Default is an empty list.
    • enable_help_text - if set to true, the tool will provide a help message with the feedback. Default is true.
    • persistent_comment - if set to true, the tool will overwrite a previous checks comment with the new feedback. Default is true.
    • final_update_message - if persistent_comment is true and updating a previous checks message, the tool will also create a new message: "Persistent checks updated to latest commit". Default is true.

    See more information about the checks tool in the docs.

    Copy link
    Member

    @diemol diemol left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Just a couple of comments, but this looks very close to what I had in mind when you raised the ticket.

    @diemol diemol linked an issue Dec 28, 2024 that may be closed by this pull request
    @joerg1985
    Copy link
    Member Author

    @diemol i have updated the PR with some inline comments.

    Copy link
    Member

    @diemol diemol left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Thank you, @joerg1985!

    @diemol diemol merged commit 93483c5 into trunk Dec 30, 2024
    3 checks passed
    @diemol diemol deleted the RetryRequest branch December 30, 2024 09:05
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    [🐛 Bug]: serverErrorPolicy does retry on Exception / Error
    3 participants