-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit the maximum amount of disk space that Lassie temp files can use #331
Comments
@bajtos the only purpose of lassie needing to keep temporary files is to handle the duplicate blocks case in a traversal, which we can't avoid. Essentially we need a place to put blocks we've fetched in case we discover later in a traversal that we need to look at it again. There's a secondary use-case for a temporary store in parallelising bitswap requests - we fetch ahead (preload) of the blocks we'll need when we discover them, so as we proceed with a linear traversal we discover that the blocks we want are already in our temporary store. It really speeds things up. Currently we put these blocks in the same per-retrieval temporary pool (CAR) which gets cleaned up after a request. It's not a straightforward thing to deal with, but here's some thoughts about this topic that maybe we could riff off:
Having said all of that, I think there's a path to having some kind of
Since you're not using Lassie as a library, but as a daemon, what might this look like? We'd presumably need an API to talk to a blockstore across some RPC boundary. |
(Posting here in case others Google tempfiles and/or /tmp)
This is not mentioned in the help section.
USAGE: COMMANDS: GLOBAL OPTIONS: |
@distora-w3 how large are your downloads that you managed to fill up temp? Or do you just have a particularly small temp? |
@rvagg Thank you for the detailed explanation! I am fine with Lassie creating temporary CAR-store files, there is nothing wrong with that! I am looking for a way how to limit the maximum amount of storage used by Lassie at any time. For example, when a Station module makes 10 retrieval requests in parallel, each request for a 10 GB UnixFS archive, I don't want Station/Lassie to consume 100 GB of available disk space. (Think about users running on low-end laptops with 256 GB storage. There may not even have 100 GB available.) I would like to tell Lassie "you can use at most 10 GB". When Lassie reaches this limit, I want it to abort requests in progress with an error, similarly to how you abort when Nice to have: abort the requests one by one while removing their temp CAR files until we have enough free space to finish the remaining requests. I understand this may be way out of the scope of Lassie, that's why I am thinking about a Station-specific CAR store implementation.
I am using Lassie as a library providing the HTTP daemon. Here is the source code for our Go func Zinnia, the runtime powering Filecoin Station, calls |
Yes, this seems to work great in my experience! In
Nice, I was not aware of this feature! Is my understanding correct that by sending the request header Are there any downsides to be aware of? E.g. can we still verify the correctness of the retrieved data when the duplicate blocks are omitted from the CAR stream? ATM, SPARK (our retrieval checker) streams the CAR file from Lassie and does not interpret it in any way. I think it's safe to enable /cc @juliangruber
As mentioned in my previous comment, we are okay with Lassie creating temporary files, there is no need to disable this part. We would like to limit the maximum amount of disk space these temporary files use. As a safety measure preventing the host machine from running out of storage space. Thank you again for your time and energy in this discussion! 🙏🏻 |
Thanks for the usage and clarification. As for temp size, it was more about not knowing and walking into the issue. |
No, but it in theory it could! That work hasn't been done, but if prioritised we could take on a project to ensure that (a) lassie has a clear understanding of whether it'll be safe to not persist blocks to produce correct output and run its retrievers and (b) actually do that, including saying
There's two reasons for
So that's for you to consider how it fits in your workflows. If you are storing blocks and you want your responses to be as efficient as possible, give it a As for temporary file restrictions - I think we could accommodate you on this one and it might be a generally useful feature, including for Saturn I think. We'd have to know the requirements though. We could track bytes written to our temporary stores and keep a running total, but deciding when to take action and what action to take would be the interesting problem that would need defining. If it's per-retrieval then it's easier, if it's across-retrievals then you have to decide which retrievals to cancel if you go beyond maximum. |
We are running Lassie in Filecoin Station - an app that runs on desktop computers of (possibly non-technical) users. We want Stations to be unobtrusive to the user and leave plenty of resources (e.g. free disk space) for user workloads.
To achieve that, we would like to limit how much space Lassie can use for temporary files.
A new configuration option for Lassie (the library) or Lassie Daemon (the HTTP server) would be ideal.
How easy or difficult would the implementation be? Are there any other options that would allow us to limit the maximum amount of disk space used? For example, I can imagine plugging in our custom block store if the Lassie library/daemon support that.
/cc @juliangruber
The text was updated successfully, but these errors were encountered: