-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(recovery): add crash recovery implementation #3491
base: satp-dev
Are you sure you want to change the base?
feat(recovery): add crash recovery implementation #3491
Conversation
I will review this PR |
f9014b0
to
0de9744
Compare
@Yogesh01000100 please rebase with satp-dev (should not have conflicts) |
0de9744
to
4c0124d
Compare
@Yogesh01000100 please include documentation and tests, and update the description, as discussed. |
ce9a179
to
24b8eaf
Compare
24b8eaf
to
728e7cb
Compare
@Yogesh01000100 could you please squash the commits and rebase with latest version of satp-dev, prior to merge? |
1a55673
to
21ad772
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks very good, but there are some changes to be done prior to merging.
Summarizing my comments:
- Add other authors to the commit
- Incorporate feedback from the logging process (namely un-hardcoding logs and adding more information)
- Implement RollbackState (for example, should state how many more steps are to be rolled-back, at any moment; what was rolledback already; estimated time to completion, etc)
- Please add tests that support the new feature
- Please add comprehensive documentation on this feature. Example: The readme of SATP should have a section on how to run the docker compose with several examples of configurations.
packages/cactus-plugin-satp-hermes/src/main/typescript/core/recovery/crash-manager.ts
Outdated
Show resolved
Hide resolved
packages/cactus-plugin-satp-hermes/src/main/typescript/core/recovery/crash-recovery-handler.ts
Outdated
Show resolved
Hide resolved
packages/cactus-plugin-satp-hermes/src/main/typescript/core/recovery/crash-recovery-handler.ts
Outdated
Show resolved
Hide resolved
...us-plugin-satp-hermes/src/main/typescript/core/recovery/rollback/stage0-rollback-strategy.ts
Outdated
Show resolved
Hide resolved
...us-plugin-satp-hermes/src/main/typescript/core/recovery/rollback/stage1-rollback-strategy.ts
Outdated
Show resolved
Hide resolved
...us-plugin-satp-hermes/src/main/typescript/core/recovery/rollback/stage2-rollback-strategy.ts
Outdated
Show resolved
Hide resolved
...us-plugin-satp-hermes/src/main/typescript/core/recovery/rollback/stage3-rollback-strategy.ts
Outdated
Show resolved
Hide resolved
...us-plugin-satp-hermes/src/main/typescript/generated/proto/cacti/satp/v02/common/health_pb.ts
Outdated
Show resolved
Hide resolved
49e1135
to
fb703b4
Compare
fb703b4
to
b30ccb5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review how sessionData is being used, and take a look at the Stage 3 question.
Please document the new code as well. The rest is being documented in this PR:
https://github.com/hyperledger/cacti/pull/3619
|
||
private async checkCrash(session: SATPSession): Promise<CrashStatus> { | ||
const fnTag = `${this.className}#checkCrash()`; | ||
const sessionData = session.hasClientSessionData() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this prioritize client session?
In this Implementation the gateway can be a client and a server at the same time. So, when this is the case we may not be deteting some crashes.
public async checkAndResolveCrash(session: SATPSession): Promise<void> { | ||
const fnTag = `${this.className}#checkAndResolveCrash()`; | ||
|
||
const sessionData = session.hasClientSessionData() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
public async handleRecovery(session: SATPSession): Promise<boolean> { | ||
const fnTag = `${this.className}#handleRecovery()`; | ||
|
||
const sessionData = session.hasClientSessionData() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
throw new Error(`${fnTag}, session data is not correctly initialized`); | ||
} | ||
const sessionData = session.hasClientSessionData() | ||
? session.getClientSessionData() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here too
|
||
this.log.info(`${fnTag} Asset Id: ${assetId} amount: ${amount}`); | ||
|
||
await bridgeManager.burnAsset(assetId, Number(amount)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the Stage 3 rollback, the rollback is only feasible if it occurs before the asset is minted on the receiver chain. Once minting happens, if the gateway encounters an issue and fails to assign the minted amount to the recipient, a rollback can no longer be initiated. Is this the reason why a rollback isn’t considered in such cases?
Wouldn't it make sense, then, for the minted amount on the receiver chain to be burned and re-minted on the source chain too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
client: true, | ||
}); | ||
|
||
const sessionData = mockSession.hasClientSessionData() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to consider this carefully. If sessionData remains as it is, we must handle it with care and clearly differentiate between the client and server sides of the gateway. I designed the sessionData this way to ensure that a gateway can act as both a client and server to itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expect(result).toBe(true); | ||
}); | ||
|
||
/*it("intitiate rollback function test", async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason for this to be commented out?
13e0302
to
2896426
Compare
f0e50ef
to
cb24d53
Compare
cb24d53
to
d14f178
Compare
Signed-off-by: Yogesh01000100 <[email protected]> chore(satp-hermes): improve DB management Signed-off-by: Rafael Belchior <[email protected]> chore(satp-hermes): crash recovery architecture Signed-off-by: Rafael Belchior <[email protected]> fix(recovery): enhance crash recovery and rollback implementation Signed-off-by: Yogesh01000100 <[email protected]> refactor(recovery): consolidate logic and improve SATP message handling Signed-off-by: Yogesh01000100 <[email protected]> feat(recovery): add rollback implementations Signed-off-by: Yogesh01000100 <[email protected]> fix: correct return types and inits Signed-off-by: Yogesh01000100 <[email protected]> Co-authored-by: Rafael Belchior <[email protected]>
Signed-off-by: Yogesh01000100 <[email protected]>
Signed-off-by: Yogesh01000100 <[email protected]>
Signed-off-by: Yogesh01000100 <[email protected]>
Signed-off-by: Yogesh01000100 <[email protected]>
Signed-off-by: Yogesh01000100 <[email protected]>
d14f178
to
4eef528
Compare
Signed-off-by: Yogesh01000100 <[email protected]>
Signed-off-by: Yogesh01000100 <[email protected]>
Signed-off-by: Yogesh01000100 <[email protected]>
Signed-off-by: Yogesh01000100 <[email protected]>
This PR addresses issue #3114
Changes
crash_recovery.proto
and related ts files; added core recovery logic (created functions not yet implemented).knexfile.ts
andknexfile-remote.ts
; added Docker Compose for production.