No "pick up where you left off" option for failed downloads #27

baron-de-montblanc · 2024-10-22T16:52:09Z

Hello, I am trying to download some rather large observations from ASVO to our group's supercomputer through giant-squid. It is very common for the download to fail (see attached screenshot for example), probably due to the connection getting interrupted.

My question is, is there an option/flag one can use with giant-squid to tell it to resume the download from where it crashed? (Or, alternatively, how could I successfully download these ~50Gb observations without it crashing?)

d3v-null · 2024-10-23T23:17:44Z

Hey Jade,
That must be frustrating.
We have a little bit of retry / error handling logic in giant-squid, but it's clearly not doing its job.

In the meantime, here's how you can use wget to handle the download instead.

giant-squid list --json $query

will give you a bunch of metadata about the jobs matching $query including a download link.

{
   "801409":{
      "obsid":1413666792,
      "jobId":801409,
      "jobType":"DownloadVisibilities",
      "jobState":"Ready",
      "files":[
         {
            "jobType":"Acacia",
            "fileUrl":"https://projects.pawsey.org.au/mwa-asvo/1413666792_801409_vis.tar?AWSAccessKeyId=...",
            "filePath":null,
            "fileSize":152505477120,
            "fileHash":"d6dfb7391a495b0eb07cc885808e9e8058e90ec3"
         }
      ]
   }
}

you can chuck fileUrl straight into wget, which has a lot of options around retrying downloads. I use --wait=60 --random-wait

If you want to automated this for many jobs you can use jq, e.g.

giant-squid list -j --states=ready -- $obslist \
    | jq -r '.[]|[.jobId,.files[0].fileUrl//"",.files[0].fileSize//"",.files[0].fileHash//""]|@tsv' \
    | while read -r jobid url size hash; do
    [ -f ${obsid}.tar ] && continue
    wget $url -O${obsid}.tar --progress=dot:giga --wait=60 --random-wait
done

gsleap · 2024-10-23T23:23:33Z

Hi Jade,

As Dev says, we currently don't have a continue-from-where-you-left-off feature as such, but it would be extremely valuable especially for large downloads. So it will definitely be on our roadmap for a future release.

In the meantime, I think Dev has used the above technique successfully, so please give that a go and let us know how it goes!

gsleap · 2024-10-23T23:34:46Z

oh and @baron-de-montblanc @d3v-null - FYI you can also pass to wget:
-c, --continue to "resume getting a partially-downloaded file"
I only just found it and does appear to work quite nicely!

d3v-null · 2024-10-23T23:48:38Z

A friendly reminder to anyone who comes across this issue: We take pull requests

The main download loop is here .

It's wrapped in an exponential backoff here .

Compared to wget , this is download handling from the stone age.

gsleap added the enhancement New feature or request label Oct 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No "pick up where you left off" option for failed downloads #27

No "pick up where you left off" option for failed downloads #27

baron-de-montblanc commented Oct 22, 2024

d3v-null commented Oct 23, 2024 •

edited

Loading

gsleap commented Oct 23, 2024

gsleap commented Oct 23, 2024

d3v-null commented Oct 23, 2024

No "pick up where you left off" option for failed downloads #27

No "pick up where you left off" option for failed downloads #27

Comments

baron-de-montblanc commented Oct 22, 2024

d3v-null commented Oct 23, 2024 • edited Loading

gsleap commented Oct 23, 2024

gsleap commented Oct 23, 2024

d3v-null commented Oct 23, 2024

d3v-null commented Oct 23, 2024 •

edited

Loading