caolan / highland

High-level streams library for Node.js and the browser

Home Page:https://caolan.github.io/highland

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

HTTP requests best practice

moose56 opened this issue · comments

I am making a number of HTTP requests using the Request module. The responses from these requests need to be unzipped and written to a file in the order the URLs were in.

So far I have the following:

function download(urls, path) {
  return new Promise((resolve, reject) => {
    const tmpPath = path + '-tmp';
    const unzip = zlib.createGunzip();
    const output = fs.createWriteStream(tmpPath);

    _(urls)
      .map(request)
      .map(_)
      .parallel(5)
      .pipe(unzip)
      .on('error', (err) => { return reject(err); })
      .pipe(output)
      .on('finish', () => {
        fs.rename(tmpPath, path, (err) => {
          if (err) { return reject(err); }
          return resolve(path);
        });
      })
      .on('error', (err) => { return reject(err); });
  });
}

This works however I have a couple of questions about the implementation.

  1. Is parallel the best option here? I guess there is a trade off between holding more data in memory or calling the URLs in sequence using flatMap. I am after speed so downloading in parallel and using a bit more memory is ok for me.
  2. Is there any way to capture events on the request. If I was using request on its own I would have something like:
request(url)
  .on('response', (response) => {
    if (response.statusCode === 404) {
      return reject(...);
    }
  })
 .on('error', (err) => { return reject(err); });

Is there a way of listening for these events when using Highland?

Any advice greatly received.

Is parallel the best option here? I guess there is a trade off between holding more data in memory or calling the URLs in sequence using flatMap. I am after speed so downloading in parallel and using a bit more memory is ok for me.

Yep. That's the trade off. If you're after download speed, then parallel is the right thing to do here. Tune the parallelism factor as necessary.

Is there a way of listening for these events when using Highland?

Yes, but it's a bit more complicated. By default, doing _(request(url)) will already listen for the error event and propagate it correctly. But if you want to listen to multiple custom events, you'll need to write your own stream generator.

Something like this. Note that I'm not calling reject until fairly late in the pipeline, and I'm using through instead of pipe. It's generally more convenient to keep everything as a Highland stream for as long as possible, since Highland does nice things like error-propagation for you.

In this case, it's also important to call push(error) instead of reject(error) in getUrl, since otherwise, you will have rejected the promise but not stop the stream.

function getUrl(url) {
  return _((push, next) => {
    request(url)
        .on('response', (response) => {
          if (response.statusCode === 200) {
            push(null, _(response));
            push(null, _.nil);
          } else {
            push(new Error('...'));
            push(null, _.nil);
          }
        })
        .on('error', (err) => {
          push(err);
          push(null, _.nil);
        });
  });
}

_(urls)
    .map(getUrl)
    .parallel(5)
    .through(unzip)  // Using through instead of pipe to keep things a Highland stream.
    .stopOnError(reject)  // Stop the steam so that no new URLs are fetched.
    .pipe(output)
    .on('finish', ...)
    .on('error', reject);

Thank you. that is really useful.

If you use superagent rather than request it automatically converts http errors so they can be handled in an .errors block.

Thanks @svozza I will have a look at superagent.

@vqvu should getUrl call next() with the stream returned by request as this is what I need to be passed on. I don't need to pass the response event result down the stream so I omit that bit. I just need to be able to access it to potentially raise an error:

function getUrl(url) {
  return _((push, next) => {
      let data = request(url)
        .on('response', (response) => {
          if (response.statusCode !== 200) {
            push(new Error('...'));
            push(null, _.nil);
          }
        })
        .on('error', (err) => {
          push(err);
          push(null, _.nil);
        });

      next(data);
  });
}

So with this example it all works as expected apart from this part:

if (response.statusCode !== 200) {
  push(new Error('...'));
  push(null, _.nil);
}

I have confirmed this is getting called as I one of my test URLs returns 404, but the error here is not raised. It is generated later on by the unzip part.

Am I using next incorrectly?

Once you've redirected to another stream with next, calling push no longer works. That's why you don't see the 404. The response is an IncomingMessage, so you should be redirecting to that instead.

There was an error in the code that I originally posted. You either want to call flatMap(getUrl) in the pipeline

_(urls)
    .flatMap(getUrl)
    .parallel(5)
    ...

or change getUrl to redirect to the response instead of pushing it

if (response.statusCode === 200) {
  next(_(response));
} else {
  push(new Error('...'));
  push(null, _.nil);
}

Thanks, I will need to look into the difference between using map and flatMap in this instance and try to get my head around it.

One final query I have is that the finish event on the pipe(output) part is still called if there is an error earlier in the pipeline. Is there a way to 'break' out of the pipeline if an error has happened or should I have an external variable to flag an error and then refer to that in the finish event?

let error;

_(urls)
  ...
  .stopOnError(err => error = err;)
  ...
  .pipe(output)
  .on('finish', () => {
    if (error) return reject(error);
    ...     
  })

Thanks for your help @vqvu, here are all the bits combined in case it helps anyone else:

function download (urls, path) {
  const tmpPath = path + '-tmp';
  const unzip = zlib.createGunzip();
  const output = fs.createWriteStream(tmpPath);

  const getUrl = url => {
    return _((push, next) => {
      request(url)
        .on('response', resp => {
          if (resp.statusCode === 200) {
            push(null, _(resp));
            push(null, _.nil);
          } else {
            push(new Error('Something went wrong'));
            push(null, _.nil);
          }
        });
    });
  };

  let error;

  return new Promise((resolve, reject) => {
    _(urls)
      .flatMap(getUrl)
      .parallel(5)
      .through(unzip)
      .stopOnError(err => error = err)
      .pipe(output)
      .on('finish', () => {
        console.log('finish');
        if (error) return reject(error);

        fs.rename(tmpPath, path, (err) => {
          if (err) { return reject(err); }
          return resolve(path);
        });
      })
      .on('error', reject);
  });
}