[BUG] Firefox downloads ends up being corrupt most of the time

Question

[BUG] Firefox downloads ends up being corrupt most of the time

peturh opened this issue 2 years ago · comments

Describe the bug
We've been using client-zip on our platform , we decided to not go for the streaming alternative initially since we wanted to support Safari as well. That is now fixed in recent Safari version, so we decided to use streaming instead.

To Reproduce
Unable to give reproduction steps unfortunately. As our website only support this for Chrome, Safari and Edge right now. But you can try out working version by using Chrome, Safari or Edge on bimobject.com (you have to register) and click download and making sure you select multiple files.

Expected behavior
Same as Chrome, Safari and Edge.

Screenshots

Archive downloaded through Firefox gets corrupt (most of the time)

Desktop (please complete the following information):

Mac, Linux and Windows
Firefox
Version 104.0 (64bit)

Additional context

The user can select files from a list that they want to download. It always works in Chrome, Safari and Edge. But in Firefox, the ZIP sometimes ends up complete (~10% of cases).

I ran zipdump

python3 zipdump.py ~/Downloads/firefoxdownload.zip

and it lists only some of the files

00000000: PK.0304: 002d 0008 0000 54af3988 00000000 00000000 00000000 0038 0000 |  0000001e 00000056 00000056 00000056 - Artificial ZZ plant 1100mmArtificial ZZ plant 1100mm.rfa
002c8056: PK.0708: a106776f 002c8000 002c8000 |  002c8066
002c8066: PK.0304: 002d 0008 0000 54af3986 00000000 00000000 00000000 0038 0000 |  002c8084 002c80bc 002c80bc 002c80bc - Artificial ZZ plant 1100mmArtificial ZZ plant 1100mm.dwg
005b5c55: PK.0708: c28b6444 002edb99 002edb99 |  005b5c65
005b5c65: PK.0304: 002d 0008 0000 54af398b 00000000 00000000 00000000 0038 0000 |  005b5c83 005b5cbb 005b5cbb 005b5cbb - Artificial ZZ plant 1100mmArtificial ZZ plant 1100mm.ifc

Same set of files on a chrome download

00000000: PK.0304: 002d 0008 0000 54af3988 00000000 00000000 00000000 0039 0000 |  0000001e 00000057 00000057 00000057 - Artificial ZZ plant 1100mm/Artificial ZZ plant 1100mm.rfa
002c8057: PK.0708: a106776f 002c8000 002c8000 |  002c8067
002c8067: PK.0304: 002d 0008 0000 54af3986 00000000 00000000 00000000 0039 0000 |  002c8085 002c80be 002c80be 002c80be - Artificial ZZ plant 1100mm/Artificial ZZ plant 1100mm.dwg
005b5c57: PK.0708: c28b6444 002edb99 002edb99 |  005b5c67
005b5c67: PK.0304: 002d 0008 0000 54af398b 00000000 00000000 00000000 0039 0000 |  005b5c85 005b5cbe 005b5cbe 005b5cbe - Artificial ZZ plant 1100mm/Artificial ZZ plant 1100mm.ifc
01287ad4: PK.0708: e6ed0337 00cd1e16 00cd1e16 |  01287ae4
01287ae4: PK.0304: 002d 0008 0000 54af3987 00000000 00000000 00000000 0039 0000 |  01287b02 01287b3b 01287b3b 01287b3b - Artificial ZZ plant 1100mm/Artificial ZZ plant 1100mm.obj
02c4f6e1: PK.0708: bffce8ef 019c7ba6 019c7ba6 |  02c4f6f1
02c4f6f1: PK.0304: 002d 0008 0000 54af3989 00000000 00000000 00000000 0039 0000 |  02c4f70f 02c4f748 02c4f748 02c4f748 - Artificial ZZ plant 1100mm/Artificial ZZ plant 1100mm.skp
04f9f516: PK.0708: 7160b309 0234fdce 0234fdce |  04f9f526
04f9f526: PK.0102: 032d 002d 0008 0000 54af3988 a106776f 002c8000 002c8000 0039 0000 0000 0000 0000 81b40000 00000000 |  04f9f554 04f9f58d 04f9f58d 04f9f58d - Artificial ZZ plant 1100mm/Artificial ZZ plant 1100mm.rfa
04f9f58d: PK.0102: 032d 002d 0008 0000 54af3986 c28b6444 002edb99 002edb99 0039 0000 0000 0000 0000 81b40000 002c8067 |  04f9f5bb 04f9f5f4 04f9f5f4 04f9f5f4 - Artificial ZZ plant 1100mm/Artificial ZZ plant 1100mm.dwg
04f9f5f4: PK.0102: 032d 002d 0008 0000 54af398b e6ed0337 00cd1e16 00cd1e16 0039 0000 0000 0000 0000 81b40000 005b5c67 |  04f9f622 04f9f65b 04f9f65b 04f9f65b - Artificial ZZ plant 1100mm/Artificial ZZ plant 1100mm.ifc
04f9f65b: PK.0102: 032d 002d 0008 0000 54af3987 bffce8ef 019c7ba6 019c7ba6 0039 0000 0000 0000 0000 81b40000 01287ae4 |  04f9f689 04f9f6c2 04f9f6c2 04f9f6c2 - Artificial ZZ plant 1100mm/Artificial ZZ plant 1100mm.obj
04f9f6c2: PK.0102: 032d 002d 0008 0000 54af3989 7160b309 0234fdce 0234fdce 0039 0000 0000 0000 0000 81b40000 02c4f6f1 |  04f9f6f0 04f9f729 04f9f729 04f9f729 - Artificial ZZ plant 1100mm/Artificial ZZ plant 1100mm.skp
04f9f729: PK.0506: 0000 0000 0005 0005 00000203 04f9f526 0000 |  04f9f73f 04f9f73f

When trying to open the file on Ubuntu, it simply says: An error occurred loading the archive.

My service worker looks like this:

importScripts('./client-zip-2.2.2.min.js')

self.addEventListener('fetch', event => {
  const url = new URL(event.request.url)

  if (url.pathname.includes('downloadZip/')) {
    event.respondWith(
      event.request
        .formData()
        .then(formData => downloadZip(activate(formData)))
        .catch(err => new Response(err.message, { status: 500 })),
    )
  }
})

function replaceHeader(response, productName) {
  const header = response.headers.get('Content-Disposition')
  productName = productName.replace(
    /[^\x00-\x7f]|[#|%|&|{|}|\\|\/|<|>|\*|\?|\$|!|'|"|:|@|\+|`|=]/g,
    '',
  )
  const parts = header.split(';')
  let fileName =
    productName +
    '/' +
    parts[1]
      .split('=')[1]
      .replace(/[^\x00-\x7f]|[#|%|&|{|}|\\|\/|<|>|\*|\?|\$|!|'|"|:|@|\+|`|=]/g, '')
  try {
    decodeURIComponent(fileName)
  } catch (ex) {
    fileName = 'unknown_filename'
  }

  var newHeaders = new Headers(response.headers)
  newHeaders.set('Content-Disposition', 'attachment; filename=' + fileName)
  const newFile = new Response(response.body, {
    headers: newHeaders,
  })
  return newFile
}
async function* activate(formData) {
  for (const value of formData.values()) {
    try {
      const parsedValue = JSON.parse(value)
      const response = await fetch(parsedValue.url)
      if (!response.ok) {
        console.warn(`skipping ${response.status} response for ${url}`)
      } else if (
        response.status === 204 ||
        response.headers.get('Content-Length') === '0' ||
        !response.body
      ) {
        console.warn(`skipping empty response for ${url}`)
      } else {
        yield replaceHeader(response.clone(), parsedValue.productName)
      }
    } catch (err) {
      console.error(err)
    }
  }
}

// Has to come after otherwise the download fetch is never executed. Needs to have try catch because it's not available on local development
try {
  importScripts('ngsw-worker.js')
} catch (e) {
  console.log('ngsw-worker.js not available (on local dev environment)', e)
}

and it's triggered with this Angular code

const form = document.createElement('form')
    form.method = 'post'
    form.action = `downloadZip/${downloadName}.zip`
    const button = document.createElement('button')
    button.type = 'submit'
    for (let file of fileDownloads) {
      const input = document.createElement('input')
      input.value = JSON.stringify({
        url: file.url,
        productName: file.productName,
      })
      input.name = 'url'
      input.type = 'hidden'

      form.appendChild(input)
    }

    form.appendChild(button)
    document.body.appendChild(form)
    form.submit()
    form.remove()

David Junger · Answer 1 · Thu Sep 01 2022 15:34:05 GMT+0800 (China Standard Time)

Hi there. Looks like the download stream was interrupted before all the file data could be written, and before any central repository information was written too. You don't happen to see anything interesting in the ServiceWorker's console ?

Quick question : was at least one tab open with your website during the whole download ? I hear that ServiceWorkers may be shut down rather quickly by some browsers once all their client windows are closed.

I haven't tried to reproduce yet. But I can already offer a suggestion for the ServiceWorker. You're going to a lot of trouble (and copying lots of data) just to rename the files in the archive, cloning the fetch Response. You could instead keep the original Response and yield { name: "whatever you want", input: theOriginalResponse }. It actually possible (though unlikely) that cloning is causing the issue in Firefox. I've had trouble with Response cloning in ServiceWorkers a few years back.

peturh · Answer 2 · Thu Sep 01 2022 20:48:51 GMT+0800 (China Standard Time)

Thanks for the reply.

I don't get any logs in the service worker in Firefox, I tried to log the filename and it shows up like this (one error that is expected):

I removed the replaceHeader function completely and it still happens unfortunately. Our filenames are also kind of bad, they show up like this in the archive when I dont have the replaceHeader function:

I'm quite new to the whole yield thing, but I will try that out. Thanks for the tip.

David Junger · Answer 3 · Thu Sep 01 2022 21:04:34 GMT+0800 (China Standard Time)

I see. Yeah, your headers are a bit overcomplicated alright. Anyway, the point wasn't to remove your filename logic.

yield doesn't matter here (it's just the syntactic bit that allows a generator function to produce multiple results lazily, and you would use it either way). My suggestion is about what you yield. Essentially, just changing the replaceHeader function.

Your implementation clones the original Response body to make a new Response with a different Content-Disposition header, just so you can change the eventual filename in the archive. It's pretty inefficient. client-zip infers the filename from Content-Disposition when you give it a Response, but if you give it an object with a name property like I suggested, it will use that and ignore whatever is in the Response headers.

So the result would be something like this (I just changed the end of the function, but you may want to rename it since it doesn't actually replace headers anymore) :

function replaceHeader(response, productName) {
  const header = response.headers.get('Content-Disposition')
  productName = productName.replace(
    /[^\x00-\x7f]|[#|%|&|{|}|\\|\/|<|>|\*|\?|\$|!|'|"|:|@|\+|`|=]/g,
    '',
  )
  const parts = header.split(';')
  let fileName =
    productName +
    '/' +
    parts[1]
      .split('=')[1]
      .replace(/[^\x00-\x7f]|[#|%|&|{|}|\\|\/|<|>|\*|\?|\$|!|'|"|:|@|\+|`|=]/g, '')
  try {
    decodeURIComponent(fileName)
  } catch (ex) {
    fileName = 'unknown_filename'
  }

  return { name: fileName, input: response }
}

peturh · Answer 4 · Wed Sep 07 2022 15:22:43 GMT+0800 (China Standard Time)

Thanks! Much cleaner.

However, doing this did not resolve the firefox bug that we have. Doing it in browser works fine though (until we hit about 400MB of downloads, then it sometimes breaks).

David Junger · Answer 5 · Thu Sep 08 2022 21:26:14 GMT+0800 (China Standard Time)

Would you mind adding a link to your website so I can try to reproduce ?

William Linna · Answer 6 · Tue Sep 27 2022 12:57:09 GMT+0800 (China Standard Time)

It seems this issue is occurring on my machine as well. I have a > 1.0 GB download and it works on Chromium, but FIrefox interrupts around 80 MiB, in less than 10 seconds. Firefox always says the download is complete without showing any errors. Dev tools show that the ServiceWorker stops at the same time. I never had issues before I tested with a ServiceWorker.

(I might be able to privately send a test link later if my employer agrees to it. I don't have a nice way to test this with public domain data at the moment)

peturh · Answer 7 · Tue Sep 27 2022 14:08:09 GMT+0800 (China Standard Time)

Would you mind adding a link to your website so I can try to reproduce ?

Hello, sorry for my late reply. I will try to think of a way to do this without affecting our users.

peturh · Answer 8 · Tue Sep 27 2022 14:08:36 GMT+0800 (China Standard Time)

Sorry, not my intention closing this. 😴

David Junger · Answer 9 · Tue Sep 27 2022 16:01:09 GMT+0800 (China Standard Time)

I hope the worker dies for some reason that we can fix, but without any error, I can't imagine what goes wrong.
This 4 year-old bug suggests there may be a 5-minute timeout in Firefox but that's not what you're seeing.

William Linna · Answer 10 · Tue Sep 27 2022 16:55:27 GMT+0800 (China Standard Time)

My experience with ServiceWorkers is limited to using client-zip, so my idea might not be perfect but..
I wonder if making keep-alive requests to ServiceWorker would be a viable workaround? That's what I'm going to try for some time.

David Junger · Answer 11 · Tue Sep 27 2022 17:10:29 GMT+0800 (China Standard Time)

There's nothing in the ServiceWorker specs about honouring keepalive headers, and anyway, the worker should be sending response data continuously to the client window when the download is started. Also, he problem here is not simply that the worker breaks the response, but that the worker itself is killed. Sending it a request after it's been killed will result in the request going straight to the network.

William Linna · Answer 12 · Tue Sep 27 2022 17:18:37 GMT+0800 (China Standard Time)

By keep-alive I actually just meant this:


  setInterval(function keepAlive() {
    fetch('downloadZip/keep-alive', {
      method: 'POST'
    });
  }, 4000);

With this, I've been able to complete downloads on Firefox, and the ServiceWorker keeps running.

David Junger · Answer 13 · Tue Sep 27 2022 17:37:06 GMT+0800 (China Standard Time)

Ah, keeping the worker alive. I am disappointed that Firefox needs such a thing to keep the worker alive. Well, whatever works…

BTW, did you know you can pass function arguments to setInterval after the first two ? like this :

setInterval(fetch, 4000, 'downloadZip/keep-alive', { method: 'POST' })

peturh · Answer 14 · Wed Sep 28 2022 04:05:10 GMT+0800 (China Standard Time)

By keep-alive I actually just meant this:
  setInterval(function keepAlive() {
    fetch('downloadZip/keep-alive', {
      method: 'POST'
    });
  }, 4000);
With this, I've been able to complete downloads on Firefox, and the ServiceWorker keeps running.

Thanks, we'll try this out.

William Linna · Answer 15 · Wed Sep 28 2022 11:02:56 GMT+0800 (China Standard Time)

Ah, keeping the worker alive. I am disappointed that Firefox needs such a thing to keep the worker alive. Well, whatever works…

BTW, did you know you can pass function arguments to setInterval after the first two ? like this :
setInterval(fetch, 4000, 'downloadZip/keep-alive', { method: 'POST' })

As a user of Firefox (on both desktop and Android) I'm quite often disappointed too. Still no IndexedDB or ServiceWorkers in Firefox private browsing mode, and there are other issues that take a long time to get fixed.

I've seen that way of calling setInterval in docs, but I've never used it since TypeScript doesn't check the types for it (and thus can't provide good auto completions either). Notice also that I named my callback function to make stack traces easier to read, which is why I didn't use the arrow syntax either.

David Junger · Answer 16 · Thu Sep 29 2022 05:29:50 GMT+0800 (China Standard Time)

I'm closing this since it's a browser bug and we've got a workaround. I'm also going to add link in the README. Most developers need to know about this if they use the Service Worker to download. Some users are bound to use Firefox.