How to download multiple pdfs in a zip?

Question

How to download multiple pdfs in a zip?

uwla opened this issue 4 years ago · comments

Hi! Thanks for creating this awesome library
I'm struggling with replacing JSZip with your library
I need to download several files in an array like this

exampleFiles = [
    {
        name: "1.pdf",
        url: "/storage/files/1.pdf'
    },
    {
        name: "hello.pdf",
        url: "/storage/files/hello.pdf'
    },
    {
        name: "abcd.pdf",
        url: "/storage/files/abcd.pdf'
    },
]

// this function is a layer of abstraction to download files
// it uses JSZip to fetch the remote files and download them 
// in a zip archive. More important, it works well!
downloadFilesInZip(files)

Can you help me implement this using your library? I've tried to do it following the code provided in the README, but it seems like the fetch api gets the binary content of the pdf and it is saved as plain text; therefore I can't display the file after downloading the zip

Thanks in advance

David Junger · Answer 1 · Tue Oct 13 2020 04:38:32 GMT+0800 (China Standard Time)

Hello. You can't specify "url" in the input for downloadZip and it's not something I want to support. The reason for that is because fetching resources can be quite complicated (what if you need a JWT to access the PDFs, for example ?) and this library doesn't want to be complicated.

But, it's really quite easy to fetch the resources and generate a correct input for downloadZip. The quick-and-slightly-inefficient way is this :

const responses = await Promise.all([
  fetch('/storage/files/1.pdf'),
  fetch('/storage/files/hello.pdf'),
  fetch('/storage/files/abcd.pdf')
])

const zippedBlob = await downloadZip(responses).blob()

You don't even need to provide a "name" for your files because client-zip will extract the file name from the URL (or the Content-Disposition HTTP header) when you provide a Response object as input.

It's slightly inefficient because the browser will start all the HTTP requests for the PDFs immediately. If you have a lot of PDFs to download, it would be better to download them in sequence using an async generator :

const urls = [
  '/storage/files/1.pdf',
  '/storage/files/hello.pdf',
  '/storage/files/abcd.pdf'
])

async function* downloadGenerator(urls) {
  for(const url of urls) yield await fetch(url)
}

const zippedBlob = await downloadZip(downloadGenerator(urls)).blob()

The difference there is that each download is started only when the previous file is completely copied to the ZIP archive.

David Junger · Answer 2 · Tue Oct 13 2020 04:43:51 GMT+0800 (China Standard Time)

Let me know if that works and if you understand how. The demo is actually in dire need of an update, and I might use something along those lines.

uwla · Answer 3 · Tue Oct 13 2020 21:08:35 GMT+0800 (China Standard Time)

Wowww! Show!! It worked nicely :) !!

It not only worked, but it reduced the size of the bundle scripts by 80kb ! I'm glade I found your library. Now the overhead is much lower for the clients :)

Thank you very much!
I will share your library with my friends!

Rishi Uttam · Answer 4 · Sat Apr 17 2021 15:39:15 GMT+0800 (China Standard Time)

async function* downloadGenerator(urls) {
  for(const url of urls) yield await fetch(url)
}

Pardon my ignorance but, will this synchronously download the fetch(url) and and start streaming the zip to the user even before the fetch network request is completed. I assume so.

David Junger · Answer 5 · Sat Apr 17 2021 16:14:52 GMT+0800 (China Standard Time)

The code inside the loop will make a GET request to the url and wait until the response headers are available (when the fetch promise resolves, so it's not synchronous even though the code looks like it) to generate the Response for downloadZip. downloadZip will then immediately begin streaming the response body into the archive (so yes, it won't wait until the whole body is downloaded).

Once the body is consumed (you may think of that as the request being completed, though, as I said, fetch resolves earlier), downloadZip will request the next Response from the generator. The generator loop does not advance (so it does not make a new fetch) until downloadZip asks for it, because generators are lazy and consumer-driven.

Rishi Uttam · Answer 6 · Sat Apr 17 2021 18:22:12 GMT+0800 (China Standard Time)

Hi. I did a test with 5 urls, each about 4MB each. Fetch will await each download first( as can be seen in network tab) upon completion of all items downloaded to memory, then only the blob will be created (almost instantly)..

Yes i do see that downloadzip will start streaming the response to the archive, but the browser does not indicate this? What i would like is for the zip to start downloading showing the stream progress (as you would any other file in chrome, natively)

I am using aync generators as you mentioned.

I Could put a progress bar to indicate to the user that something is indeed happening, but ultimately i would like the browser to handle that and stream the download.

Sorry im a front end programmer, and not familiar with all this, but i am sure client-zip tackles this. Thank you.

edit : now reading this #9
service workers will help here? is this is what streamsaver is using?

David Junger · Answer 7 · Sat Apr 17 2021 22:27:27 GMT+0800 (China Standard Time)

I suggest you look at the streaming demo using the ServiceWorker if you're interested in memory efficiency. Your network tab should tell quite a different story…

You see, when you call blob() on the Response returned by client-zip, you tell the browser to buffer the whole thing, and only then do you generate the Blob URL to initiate the download for the user. That's what explains your observation, not fetch waiting for anything. I put that code with blob() in the README as a simple example for using client-zip because it is easier to understand than going through the ServiceWorker (also, it works in Safari) but it's not ideal for large files.

Finally, to address your progress bar issue, look at issue #19 .

David Junger · Answer 8 · Sat Apr 17 2021 22:30:13 GMT+0800 (China Standard Time)

And yes, streamsaver uses a ServiceWorker, although not in the same way I am doing it in my demo (my demo generates the Zip inside the SW, whereas streamsaver just passes the stream generated in the main window to the SW).

myeongwooni · Answer 9 · Tue Nov 09 2021 18:43:01 GMT+0800 (China Standard Time)

Hello! Thanks for this library! 👍

I'm a beginner and got a little trouble.
Is there a way to add name to each file if I use this function?

async function* downloadGenerator(urls) {
  for(const url of urls) yield await fetch(url)
}

const zippedBlob = await downloadZip(downloadGenerator(urls)).blob()

David Junger · Answer 10 · Tue Nov 09 2021 19:18:47 GMT+0800 (China Standard Time)

Hello @myeongwooni. Yes, you can yield objects with name, lastModified and input properties instead of just the Response. For example, if you want sequential filenames and you know all the files are plain text :

let number = 1
async function* downloadGenerator(urls) {
  for(const url of urls) yield { input: await fetch(url), name: `file${number++}.txt` }
}

Of course you might need more logic there to compute appropriate filenames. You have access to the whole Response for that. It's how client-zip does it by default when you just pass it a Response: it looks at the response headers (specifically the Content-Disposition header if present ; if not, it will use the last part of the URL's pathname).

myeongwooni · Answer 11 · Wed Nov 10 2021 10:32:26 GMT+0800 (China Standard Time)

Thanks!!
It worked 👍👍
I really appreciate for the library and your comment!