`url_bases` vs `url_base` mixup in `downloader.py`

Question

`url_bases` vs `url_base` mixup in `downloader.py`

haydenshively opened this issue 2 years ago · comments

The function download_and_cache_file_return_first_success accepts an array of url_bases and hands them off to download_and_cache_file, which expects just a single url_base. This causes the url concatenation to fail in download_file.

To fix this, I'd suggest something like this in download_and_cache_file_return_first_success:

if not os.path.isfile(filepath) or overwrite:
  for url_base in url_bases: # <-- critical that we iterate over these somehow
    try:
      data_zipped = download_file(url_base, folder_path, filename_zipped)
      break
    except (DownloadFailed, pycurl.error, socket.timeout):
      data_zipped = None

  if data_zipped is None:
    unix_time = time.time()
    os.makedirs(folder_path_abs, exist_ok=True)
    with atomic_write(filepath_attempt, mode='w', overwrite=True) as wf:
      wf.write(str(unix_time))
    raise DownloadFailed(f"Could not download {folder_path + filename_zipped} from {url_bases} ")

I can put up a PR if you want; just not sure what the contribution guidelines are. Thanks!

Harald Schäfer · Answer 1 · Mon Mar 27 2023 11:14:40 GMT+0800 (China Standard Time)

The retryable decorator deals with this, but I agree it's very confusing to read