[NOTES] Restaking infrastructure v2 #732

drewstone · 2024-08-06T15:19:59Z

Pallet services and assets

Get active services for a specific blueprint (i.e. Blueprint 1 w/ 5 service instances)
Ensure operators can set instance limits per blueprint
If an operator wants to completely exit, it needs to
- Stop all active services and wait until not active
- Deregister from service blueprints (is it still an operator in the system?)
- Shut down operator, let stakers reclaim their stake (i.e. unstaking and not withdrawing the stakers/delegators stake).
Operators just run shell manager at the bare minimum or they run the validator client/full node client which contains.

Operator states

Active - accepting services and participating in them
Leaving - no longer accepting services and beginning to leave the operator set
Inactive - not accepting services but maybe participating in them

Staking pallet

Revert back to OG staking pallet

Tangle liquid staking pallet

Liquid stake/nominate TNT on validators and get tgTNT_{validators}.
The TNT is held by the pallet and locked for a period of time.
Assume the validator leaves the set, what happens to the tgTNT_{validator}?
- Are there side effects that need to be considered?
- Make sure this flow is tested (fuzz it https://github.com/srlabs/substrate-runtime-fuzzer)
- Test with 0 values, test with empty strings, test with large values, etc.

Service pallet

Where does an operator set their limit for how many active service instances they will execute per blueprint?
- Ensure we enforce this and test it. (NEEDS TASK)
Should there be an expiration for instance requests? Is there a limit on the # of requests an operator can have?
- Instance requests expire after 1000 blocks?
Service requesters modifying their requests while they're not finalized (operator(s) reject service requests)
Should there be a bond requests that require approvals?
Tests
- All error cases tested
- Test exhausting parts of the system (create 1000 service requests, exhaust operator limits to hit errors from operator side)
Requesting a service should indicate what assets are being used for the securing the service. This is important for slashing and rewards.
Submissions
- Who can submit jobs
- Who can submit job results
- Tests to ensure its protected enough. Tests to demonstrate failure to call job result for a different job.

Report submission

Dealing with job misbehaviors vs quality of service misbehaviors.

function verifyReport(
    uint64 serviceId,
    uint8 jobIndex,
    uint64 jobCallId,
    bytes calldata participant,
    bytes calldata inputs,
    bytes calldata outputs
) public override view onlyRuntime {
    if jobCallId == 1 {
        verifyReportForJob1();
    } else {
        // Potential Quality of Service reporting
        // TODO: How to identify what quality of service report it is.
    }
}

Slashing

Slashing similar to validator slashing, meaning there's a global list of potential slashes and the governance system can veto slashes. Slashes should have a 1-2 week period to review.
Submit report extrinsic (similar to submit job result) which submits a report if the Blueprint has reporting specified.
Research Symbiotic and Eigenlayer

Incentives and rewards

Do we have incentivized blueprints? activate_blueprint_for_rewards, deactivate_blueprint_from_rewards should be in service pallet or separate one (perhaps separate pallet for rewards and slashes).
Deposit APYs for delegation pallets.
Rewarding operators of a incentivized blueprint (things in Runtime and then there's specific things in the Blueprint)
- Build distributions of successful and failed job submissions and reports (counter metric)
  - [op_1, op_2, ..., op_10] - successful_submissions([10, 2, 7, 6, 0, 3, ..., 9]).
  - [op_1, op_2, ..., op_10] - failed_submissions([10, 2, 7, 6, 0, 3, ..., 9]).
- Build distributions of quality of service metrics
  - [op_1, op_2, ..., op_10] - heartbeats_in_last_era([99, 100, 99, 99, 90, ..., 100]).
- Point system built by the blueprint developer
  - Maybe each job contains t signatures and so all t signers get a point.
  - Maybe each job submission is a race and so whoever submits the most jobs gets the most points
  - Maybe everyone gets a point for posting a heartbeat every 100 blocks.
Rewards issued each session (or era)
Rewards need to pay out to the operator and all its delegators proportionally to their stake provided (go to stash initially, eventually can specify an LST strategy for your TNT rewards to be auto-liquid staked and restaked).
How to distribute rewards to delegators?
- USD oracle for assets

Exploration

Response Time and Latency:

Operators report response times, but clients can also submit signed timestamps.
Discrepancies between operator-reported and client-reported times can trigger audits or slashing.

Uptime and Availability:

Implement a challenge-response system where random challenges are sent to services.
Failure to respond within a set timeframe results in penalties.

Throughput and Request Count:

Clients submit signed receipts for each request.
Operators' reported throughput can be cross-checked against these receipts.

Error Rates:

Clients submit signed error reports.
Large discrepancies between reported and client-submitted errors trigger penalties.

Data Integrity:

Use Merkle trees or other cryptographic proofs to verify data hasn't been tampered with.
Operators provide proofs along with their reports.

Computation Correctness:

Implement zero-knowledge proofs or verifiable computation techniques for complex tasks.
Operators provide proofs of correct computation along with results.

Resource Usage:

Implement trusted execution environments (TEEs) or secure enclaves to report actual resource usage.
Cross-check reported usage against expected usage based on tasks performed.

Network Traffic:

Implement packet signing at network boundaries.
Operators provide proofs of traffic volume, which can be verified against client-side data.

Storage Proofs:

Use techniques like Proof-of-Retrievability or Proof-of-Space to verify data is being stored correctly.

Smart Contract Interactions:

Track on-chain interactions initiated by the service.
Operators report these, which can be directly verified against blockchain data.

Consensus Participation (for blockchain-related services):

Verify participation in consensus mechanisms through on-chain data.

API Usage Metrics:

Implement API key usage tracking on-chain.
Cross-check operator reports against this on-chain data.

Service Level Agreement (SLA) Compliance:

Define SLAs in smart contracts.
Automatically calculate compliance based on verifiable metrics.

Security Incident Reporting:

Require cryptographic proofs or third-party audits for reported security incidents.
Implement bug bounties to incentivize external reporting of undisclosed incidents.

Version and Patch Management:

Require signed code hashes for deployed versions.
Verify reported versions against these hashes.

Load Balancing Effectiveness:

Implement client-side load reporting.
Cross-check against operator-reported load distribution.

Data Processing Metrics:

For batch jobs, require input and output hashes.
For stream processing, implement checkpointing with cryptographic proofs.

Scalability Metrics:

Implement challenge-based load testing.
Verify reported scalability against performance under these controlled tests.

Cost Accrual:

Implement fine-grained, on-chain cost tracking.
Operators report usage, which is verified against this on-chain data.

Compliance and Audit Logs:

Require tamper-evident logging (e.g., using append-only data structures with frequent commitments on-chain).
Allow for zero-knowledge proofs of log properties without revealing sensitive data.

Slashing Conditions:

Consistent Misreporting: If an operator's reports consistently deviate from verifiable data.
Failure to Provide Proofs: If an operator fails to provide required cryptographic proofs.
Missed Challenges: If an operator repeatedly fails to respond to uptime or performance challenges.
SLA Violations: If an operator fails to meet SLAs beyond a certain threshold.
Security Breaches: If an operator fails to report or address critical security issues.
Resource Misuse: If verified resource usage significantly exceeds reported usage.

Implementation Considerations:

Develop a robust challenge-response protocol for real-time verification of critical metrics.
Implement a reputation system that factors in the accuracy of reported metrics over time.
Create a decentralized oracle network for third-party verification of certain metrics.
Use threshold signatures or multi-party computation for sensitive operations to prevent single points of failure.

The text was updated successfully, but these errors were encountered:

drewstone assigned shekohex and 1xstj Aug 6, 2024

drewstone mentioned this issue Oct 7, 2024

[CHECKLIST] Service pallet features for pre-launch #783

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NOTES] Restaking infrastructure v2 #732

[NOTES] Restaking infrastructure v2 #732

drewstone commented Aug 6, 2024 •

edited

Loading

[NOTES] Restaking infrastructure v2 #732

[NOTES] Restaking infrastructure v2 #732

Comments

drewstone commented Aug 6, 2024 • edited Loading

Pallet services and assets

Operator states

Staking pallet

Tangle liquid staking pallet

Service pallet

Slashing

Incentives and rewards

Exploration

Response Time and Latency:

Uptime and Availability:

Throughput and Request Count:

Error Rates:

Data Integrity:

Computation Correctness:

Resource Usage:

Network Traffic:

Storage Proofs:

Smart Contract Interactions:

Consensus Participation (for blockchain-related services):

API Usage Metrics:

Service Level Agreement (SLA) Compliance:

Security Incident Reporting:

Version and Patch Management:

Load Balancing Effectiveness:

Data Processing Metrics:

Scalability Metrics:

Cost Accrual:

Compliance and Audit Logs:

Slashing Conditions:

Implementation Considerations:

drewstone commented Aug 6, 2024 •

edited

Loading