Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ECO-5122] Implement the "asynchronously" part of CHA-RL1h5 #174

Merged
merged 2 commits into from
Dec 9, 2024

Conversation

lawrence-forooghian
Copy link
Collaborator

@lawrence-forooghian lawrence-forooghian commented Dec 4, 2024

Note: This PR is based on top of #171; please review that one first.

Note that CHA-RL1h5, as currently written, does not respect operation atomicity. I’ve raised ably/specification#253 for this, and my implementation implements the changes that I’ve suggested there, introducing a new RUNDOWN room lifecycle operation. See commit messages and code comments for more details.

Resolves #119.

Summary by CodeRabbit

  • New Features

    • Introduced a RUNDOWN operation for improved room lifecycle management, allowing for better handling of failed contributors.
    • Enhanced status management with new operational states for more robust error handling during attach and detach processes.
  • Bug Fixes

    • Improved the retry logic for detaching contributors when a failure occurs.
  • Documentation

    • Updated comments and documentation for clarity on new operations and their integration within the lifecycle management.

Copy link

coderabbitai bot commented Dec 4, 2024

Warning

Rate limit exceeded

@lawrence-forooghian has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 24 minutes and 4 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 9a392b6 and 77d0791.

Walkthrough

The changes introduce significant updates to the RoomLifecycleManager and its associated testing framework. New cases have been added to the Status enum to accommodate a RUNDOWN operation, which includes the implementation of a method to manage contributor detachment in failed states. The testing suite has been enhanced with new tests that validate the behavior of the RUNDOWN operation, ensuring it operates asynchronously as specified in the linked issues. Adjustments to existing tests reflect the updated lifecycle management logic and operational states.

Changes

File Change Summary
Sources/AblyChat/RoomLifecycleManager.swift Added methods: performRundownOperation, bodyOfRundownOperation; added enum cases: failedAndPerformingRundownOperation, failedAwaitingStartOfRundownOperation; updated existing methods to handle new statuses.
Tests/AblyChatTests/DefaultRoomLifecycleManagerTests.swift Added tests: rundown_detachesAllNonFailedChannels, rundown_ifADetachFailsItIsRetriedUntilSuccess; renamed test: attach_whenAttachPutsChannelIntoFailedState_detachesAllNonFailedChannels to attach_whenAttachPutsChannelIntoFailedState_schedulesRundownOperation; updated comments for clarity.

Assessment against linked issues

Objective Addressed Explanation
Detaching all non-failed channels should happen asynchronously (#[119], ECO-5122)
Ensure that the RUNDOWN operation retries detaching on failure (#[119], ECO-5122)

Poem

🐰 In the room where we play,
A new operation leads the way,
RUNDOWN dances, swift and bright,
Detaching channels, a graceful flight.
With tests that cheer and validate,
Our lifecycle's strong, we celebrate! 🎉


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
Sources/AblyChat/RoomLifecycleManager.swift (1)

1238-1247: Add delay and cancellation check in infinite retry loop

The infinite while true loop retries detaching contributors without any delay or exit condition. This may lead to excessive resource usage. Consider adding a delay between retries and checking for task cancellation to improve performance and responsiveness.

Apply this diff to add a delay and handle cancellation:

                while true {
                    do {
                        logger.log(message: "Detaching non-failed contributor \(contributor)", level: .info)
                        try await contributor.channel.detach()
                        break
                    } catch {
                        logger.log(message: "Failed to detach non-failed contributor \(contributor), error \(error). Retrying.", level: .info)
+                       // Add a small delay before retrying
+                       try await clock.sleep(timeInterval: 0.25)
+                       // Check for task cancellation
+                       try Task.checkCancellation()
                    }
                }
Tests/AblyChatTests/DefaultRoomLifecycleManagerTests.swift (1)

1388-1412: Consider adding delay and cancellation checks in retry logic

In the test rundown_ifADetachFailsItIsRetriedUntilSuccess, the retry logic does not include any delay, potentially causing tight loops during testing.

Apply this diff to simulate delay and enhance test realism:

            let detachResult = { @Sendable (callCount: Int) async -> MockRoomLifecycleContributorChannel.AttachOrDetachBehavior in
                if callCount == 1 {
                    return .failure(.create(withCode: 123, message: ""))
                } else {
                    return .success
                }
            }

+           let sleepOperation = SignallableSleepOperation()
+           let clock = MockSimpleClock(sleepBehavior: sleepOperation.behavior)

            let contributors = [
                createContributor(
                    detachBehavior: .fromFunction(detachResult)
                ),
            ]

+           let manager = await createManager(contributors: contributors, clock: clock)

            // When: `performRundownOperation()` is called on the lifecycle manager
            await manager.performRundownOperation(errorForFailedStatus: .createUnknownError() /* arbitrary */ )

+           // Allow the sleep to complete
+           sleepOperation.complete()

            // Then: the lifecycle manager calls `detach` twice on the contributor (i.e. it retries the failed detach)
            #expect(await contributors[0].channel.detachCallCount == 2)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 90f9c55 and 9a392b6.

📒 Files selected for processing (2)
  • Sources/AblyChat/RoomLifecycleManager.swift (11 hunks)
  • Tests/AblyChatTests/DefaultRoomLifecycleManagerTests.swift (6 hunks)
🔇 Additional comments (2)
Sources/AblyChat/RoomLifecycleManager.swift (1)

854-862: Ensure proper error propagation and state transition

When scheduling the RUNDOWN operation after a contributor fails to attach, make sure that the error is propagated correctly and the state transitions are handled appropriately.

Run the following script to verify that all callers handle the thrown error correctly:

Ensure that all calls are properly using try and handling possible errors.

✅ Verification successful

Based on the search results, I can now provide a final response:

Error handling is properly implemented for performAttachOperation

The verification shows that:

  • performAttachOperation() is correctly marked as throws and is always called with proper error handling
  • All calls to this method in Room.swift and tests use try await
  • The implementation properly propagates errors through the _performAttachOperation and bodyOfAttachOperation methods
  • The test suite includes comprehensive error handling scenarios, including:
    • Room release state errors
    • Attachment failures
    • State transition errors
    • Contributor failures
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find all calls to `performAttachOperation` and ensure errors are handled.

# Search for calls to `performAttachOperation` not within a `try` block.
rg -U 'performAttachOperation\(\)(?!\s*(try|await))' -A 5

Length of output: 333


Script:

#!/bin/bash
# Description: Find all calls to `performAttachOperation` with proper context
rg -U "performAttachOperation\(" -A 5

Length of output: 15880


Script:

#!/bin/bash
# Description: Use ast-grep to find function calls to performAttachOperation
ast-grep --pattern 'performAttachOperation()'

Length of output: 47


Script:

#!/bin/bash
# Description: Find the implementation of performAttachOperation to understand its error handling
rg -U "func performAttachOperation" -A 20

Length of output: 5976

Tests/AblyChatTests/DefaultRoomLifecycleManagerTests.swift (1)

1350-1384: New tests for RUNDOWN operation are comprehensive

The added tests effectively verify the behavior of the RUNDOWN operation, ensuring non-failed contributors are detached and retries happen correctly.

Sources/AblyChat/RoomLifecycleManager.swift Show resolved Hide resolved
@lawrence-forooghian
Copy link
Collaborator Author

Tests are failing due to #169.

@lawrence-forooghian lawrence-forooghian force-pushed the 152-implement-async-room-get-spec branch from 90f9c55 to e7612c9 Compare December 4, 2024 18:16
Base automatically changed from 152-implement-async-room-get-spec to main December 4, 2024 18:24
We don’t use this “will” language elsewhere.
This implementation reflects my suggested spec changes [1] which aim to
preserve lifecycle operation atomicity by introducing a new RUNDOWN
operation:

> The `ATTACH` operation ends in CHA-RL1h4, implying that the CHA-RL1h5
> asynchronous detach happens _outside of any room lifecycle operation_.
> This means that another room lifecycle operation could run at the same
> time as this detach operation, which doesn't seem intentional.
>
> I looked at the JS implementation [2] and it seems that it keeps the
> lifecycle manager’s mutex locked during this "rundown" (as it calls the
> CHA-RL1h5 detach operation). But this is not implied in the spec. I
> think that to translate this behaviour to the spec, which implements
> mutual exclusion through lifecycle operations, we should do something
> analogous to the `RETRY` operation; that is, define a new internal-only
> room lifecycle operation (I’ll call it `RUNDOWN` for want of a better
> term), which is scheduled by CHA-RL1h5 and which:
>
> - performs the detach-all-non-`FAILED`-contributors behaviour of CHA-RL1h5
> - implements the retry behaviour of CHA-RL1h6

Resolves #119.

[1] ably/specification#253
[2] https://github.com/ably/ably-chat-js/blob/e8380583424a83f7151405cc0716e01302295eb6/src/core/room-lifecycle-manager.ts#L506-L509
@lawrence-forooghian lawrence-forooghian force-pushed the 119-CHA-RL1h5-asynchronously branch from 9a392b6 to 77d0791 Compare December 4, 2024 18:25
@maratal
Copy link
Collaborator

maratal commented Dec 5, 2024

What does this mean @lawrence-forooghian:

When the room enters the FAILED status as a result of CHA-RL1h4, asynchronously with respect to CHA-RL1h4 ...

@lawrence-forooghian
Copy link
Collaborator Author

Yeah, that spec point isn't very clear; I hope that we can also make it clearer when we address ably/specification#253. My interpretation is that it means something like:

When the room enters the FAILED status as a result of CHA-RL1h4, then the room will detach all channels that are not in the FAILED state. It will perform this detach in a manner that does not block the completion of the ATTACH operation.

@maratal
Copy link
Collaborator

maratal commented Dec 5, 2024

Yeah, that spec point isn't very clear; I hope that we can also make it clearer when we address ably/specification#253. My interpretation is that it means something like:

When the room enters the FAILED status as a result of CHA-RL1h4, then the room will detach all channels that are not in the FAILED state. It will perform this detach in a manner that does not block the completion of the ATTACH operation.

But they all async anyway. Also attach operation is failed at this point and all you need to do is to detach all the channels. So I still don't get it.

@lawrence-forooghian
Copy link
Collaborator Author

But they all async anyway.

What do you mean?

Also attach operation is failed at this point

You mean because CHA-RL1h4 says that the ATTACH operation should throw an error and enter FAILED? Yeah, but before this "asynchronous" part was added to CHA-RL1h5, it wasn't clear at what moment the CHA-RL1h5 detach was meant to happen — e.g. maybe it was between the FAILED status change and the completion of the ATTACH operation

@maratal
Copy link
Collaborator

maratal commented Dec 5, 2024

But they all async anyway.

What do you mean?

It was re "in a manner that does not block", but is nonsense anyway, ignore.

Also attach operation is failed at this point

You mean because CHA-RL1h4 says that the ATTACH operation should throw an error and enter FAILED? Yeah, but before this "asynchronous" part was added to CHA-RL1h5, it wasn't clear at what moment the CHA-RL1h5 detach was meant to happen — e.g. maybe it was between the FAILED status change and the completion of the ATTACH operation

So attach operation might complete (failed) before channels are detached? What state room is between it goes to FAILED and all channels are detached?

@lawrence-forooghian
Copy link
Collaborator Author

So attach operation might complete (failed) before channels are detached?

Yep.

What state room is between it goes to FAILED and all channels are detached?

It remains FAILED.

Copy link
Collaborator

@maratal maratal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lawrence-forooghian lawrence-forooghian merged commit d48dba5 into main Dec 9, 2024
12 checks passed
@lawrence-forooghian lawrence-forooghian deleted the 119-CHA-RL1h5-asynchronously branch December 9, 2024 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Implement "asynchronously" part of CHA-RL1h5
2 participants