You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
unlock reducing replication factor and increase overall safety of storage of data objects.
Remark
The purpose of the archival node is allow cutting costs by substituting more centralisation for redundancy in a way which is deemed safe. Namely, to stop using storage nodes as primary and only long term glacial storage solution by having a high replication factor. This is overkill as true archival storage does not need low latency availability. Instead, storage nodes are here meant to serve as origin serves for distributors, which requires far less redundancy, perhaps only 2x is fine so long as archival nodes are operating.
Now, the name archival node, is perhaps a misnomer. This does not need to be a standalone node which has a public API that other nodes can connect to for any kind of service or interaction. The bare minimum here is just a script, ideally stateless, that simply stores every single data object every uploaded successfully. It does not even need to respond to deletion events, that is overkill. It can use its local file system as state for what data objects it has fully downloaded, e.g. using data object id, and then using on-chain size indicator to know if tis complete or not. If downloads fails, it just will abandon and retry at a later interval. Any data object that actually needs to be recovered can manually be downloaded from the host using scp or some other really simple mechanism.
I would at least advocate for such a minimal effort system to unlock the benefits we are looking for with minimal risk and time. Later it can be extended if we run into trouble or costs of any kind.
The text was updated successfully, but these errors were encountered:
unlock reducing replication factor and increase overall safety of storage of data objects.
Remark
The purpose of the archival node is allow cutting costs by substituting more centralisation for redundancy in a way which is deemed safe. Namely, to stop using storage nodes as primary and only long term glacial storage solution by having a high replication factor. This is overkill as true archival storage does not need low latency availability. Instead, storage nodes are here meant to serve as origin serves for distributors, which requires far less redundancy, perhaps only 2x is fine so long as archival nodes are operating.
Now, the name archival node, is perhaps a misnomer. This does not need to be a standalone node which has a public API that other nodes can connect to for any kind of service or interaction. The bare minimum here is just a script, ideally stateless, that simply stores every single data object every uploaded successfully. It does not even need to respond to deletion events, that is overkill. It can use its local file system as state for what data objects it has fully downloaded, e.g. using data object id, and then using on-chain size indicator to know if tis complete or not. If downloads fails, it just will abandon and retry at a later interval. Any data object that actually needs to be recovered can manually be downloaded from the host using
scp
or some other really simple mechanism.I would at least advocate for such a minimal effort system to unlock the benefits we are looking for with minimal risk and time. Later it can be extended if we run into trouble or costs of any kind.
The text was updated successfully, but these errors were encountered: