Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix races during parallel downloads #2903

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Commits on Nov 14, 2024

  1. Add failing redownload test

    We tested that modified last modified time on the server cause a
    redownload, but we did not test that after a redownload we update the
    cache, so the next attempt will used the cached download.
    
    Add a failing test verifying the issue, and improve test comments and
    configuration to make it more clear.
    
    Part-of: lima-vm#2902
    Signed-off-by: Nir Soffer <[email protected]>
    nirs committed Nov 14, 2024
    Configuration menu
    Copy the full SHA
    92537e5 View commit details
    Browse the repository at this point in the history
  2. Split Download() to smaller functions

    Extract getCache() to handle getting and file from the cache, and
    fetch() for fetching a file from the remote to the cache. This will
    allow locking around the code checking and updating the cache, and much
    easier to maintain and understand.
    
    Signed-off-by: Nir Soffer <[email protected]>
    nirs committed Nov 14, 2024
    Configuration menu
    Copy the full SHA
    8b90a0a View commit details
    Browse the repository at this point in the history
  3. Lock cache directory during download

    To solve the races during concurrent downloads and avoid unneeded work
    and bandwidth, we allow one concurrent download of the same image.
    
    When a limactl process try to access the cache, it takes a lock on the
    file cache directory. If multiple processes try to get the lock in the
    same time, only one will take the lock, and the other will block.
    
    The process that took the lock tries to get the file from the cache.
    This is the fast path and the common case. This can fail if the file is
    not in the cache, the digest does not match, or the cached last modified
    time does not match the last modified returned from the server.
    
    If the process cannot get the file from the cache, it downloads the file
    from the remote server, and update the cached data and metadata files.
    
    Finally the process release the lock on the cache directory. Other
    limactl processes waiting on the lock wake up and take the lock. In the
    common case they will find the image in the cache and will release the
    lock quickly.
    
    Since we have exactly one concurrent download, updating the metadata
    files is trivial and we don't need the writeFirst() helper.
    
    Fixes: lima-vm#2902
    Fixes: lima-vm#2732
    Signed-off-by: Nir Soffer <[email protected]>
    nirs committed Nov 14, 2024
    Configuration menu
    Copy the full SHA
    76d5c86 View commit details
    Browse the repository at this point in the history
  4. Make parallel downloads test more strict

    We can assert now that only one thread downloaded the file, and all
    other threads used the cache.
    
    Signed-off-by: Nir Soffer <[email protected]>
    nirs committed Nov 14, 2024
    Configuration menu
    Copy the full SHA
    c1b4d72 View commit details
    Browse the repository at this point in the history
  5. Fix race in dowloader.Cached()

    Checking if an image is cached races with parallel downloads. Take the
    lock when validating the digest or the data file to ensure that we
    validate the cached when it is in consistent state.
    
    If an image is being downloaded, the check will block until the download
    completes.
    
    Signed-off-by: Nir Soffer <[email protected]>
    nirs committed Nov 14, 2024
    Configuration menu
    Copy the full SHA
    5071535 View commit details
    Browse the repository at this point in the history