caolan / highland

High-level streams library for Node.js and the browser

Home Page:https://caolan.github.io/highland

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to get the last good value on the stream when an error occurs?

nckswt opened this issue · comments

I'm trying to do some error reporting, and I'd like to have both the error and the input value that caused the error simultaneously so that I can publish both.

Something like:

_([1, 2, 3, 4])
  .map((x) => {
    if (x > 2) throw new Error('Too big!');
    return x + 10;
  })
  .consume((err, x, push, next) => {
    console.log(err.message); // Should be 'Too Big!'
    console.log(x); // Should be 3, not 2, not 12 and not 13

    if (err) {
      publishErrorEvent({ err, x });
    } else {
      next();
    }
  });

The problem is, it looks like .map will always push x === undefined into the stream whenever an error is thrown.

Is there a way to get the last valid value in the stream? I'm really hoping I don't have to cache it to local state or something like that. I've tried using latest and last, but those don't seem to do what I'm looking for.

Note: I'm trying to do this for some arbitrary pipelines, so I can't just publish the error in the .map

I do my best to read these issues carefully so I best understand what you're trying to do but from time to time I may misunderstand so it may take a couple of tries.

So the problem with the approach above as you noticed, is the x value that breaks is never pushed downstream. I would reach for a general higher-order-function to wrap your mapping function like:

const stream = require("highland");

/**
 * catchError
 * Wraps a function in try...catch so it can report the value that caused the
 * processing error.
 *
 * Takes a function that will receive a value to process and returns the
 * processed value.
 *
 * Returns a function that takes a value to process which will return either
 * the processed value or throw an object with the error and the value that caused it.
 */
function catchError (fn) {
  return (x) => {
    try {
      return fn(x);
    }
    catch (err) {
      throw { err, x };
    }
  };
}

// Example
stream([1, 2, 3, 4])
  .map(catchError(x => {
    if (x > 2) {
      throw new Error("Too big!");
    }

    return x + 10;
  }))
  .each(console.log);

The added bonus is that it can be used with Highland's map, filter, each, etc... as well as JS's array map, filter, and forEach functions.

Thanks for the great example! I've nearly solved my problem -- I just need to apply your catchError function to through streams somehow. Take this example:

// Map example works
function catchError(fn) {
  return (x) => {
    console.log('Safety net')
    try {
      return fn(x);
    } catch (err) {
      err.x = x;
      throw err;
    }
  };
}

console.log('Using map');
_([1, 2, 3, 4])
  .map(catchError((x) => {
    if (x > 2) {
      throw new Error('Too big!');
    }

    return x + 10;
  }))
  .errors((err, push) => {
    console.log(`${err.name}: ${err.message} -> ${err.x}`);
  })
  .each(console.log);

// But through stream example does not
console.log('\nUsing a through stream');
const pipeline = (s) => s
  .map((x) => {
    if (x > 2) {
      throw new Error('Too big!');
    }

    return x + 10;
  });

_([1, 2, 3, 4])
  .through(catchError(pipeline))
  .errors((err, push) => {
    console.log(`${err.name}: ${err.message} -> ${err.x}`);
  })
  .each(console.log);

The output is:

Using map
Safety net
11
Safety net
12
Safety net
Error: Too big! -> 3
Safety net
Error: Too big! -> 4

Using a through stream
Safety net
11
12
Error: Too big! -> undefined
Error: Too big! -> undefined

If the pipeline is some arbitrary pipeline I'm given, how could I append x to errors thrown in that pipeline? I get that I'm only applying catchErrors to the target transform instead of each function called on each value in the stream, but not quite sure how to integrate the two.

I see. That is a bit tricker but I do have 3 possible solutions. I'll explain the solutions first then show how it fits into the test code.

I'm not sure this is really the best approach to the problem. This seems to be the exact use case for flatMap where for every input (your x) you are returning a stream of 0, 1, or infinite values or map().sequence(), or parallel, merge, mergeWithLimit. This would let you very easily tie inputs to outputs but also allow you to control how many items are processed at once.

Another even simpler option is to not make the stream responsible for relating the inputs to outputs and instead rely on logging to make that connection. If it's just for reporting purposes then you could try something like:

console.log('\nUsing a through stream');
const pipeline = (s) => s
  .map((x) => {
    if (x > 2) {
      throw new Error('Too big!');
    }

    return x + 10;
  });

stream([1, 2, 3, 4])
  .tap(console.log) // record input
  .through(pipeline)
  .errors((err, push) => {
    console.log(`${err.name}: ${err.message}`); // Record errors
  })
  .each(console.log);

This way the stream doesn't need to track the required state to do what you're looking for. However, this can certainly be done if the recommended paths above don't apply.

/**
 * catchPipelineError
 * Relates incoming inputs to errors in the output by means of storing minimal
 * state. Likely the most performant of the two options.
 * Takes a function to operate on a stream, just like the through method.
 * Returns a stream of values or emits errors with a .x prop for last input.
 */
function catchPipelineError (fn) {
  let lastX = null;

  return source => source
    .tap(x => {
      lastX = x;
    })
    .through(fn)
    // update the error and emit it again
    .errors((err, push) => {
      err.x = lastX;
      push(err);
    });
}

There is a bit of state but if you look at the source of latest or last, this is more or less what they do.

This should cover most cases but there is potential for issues if say the pipeline uses more complex, async steps where inputs don't match the order of outputs within the target pipeline.

/**
 * catchPipelineError
 * Relates incoming stream inputs to output of a pipeline by joining streams
 * Takes a function to transform the stream, just like the through method.
 * Returns a stream of values or emits errors with a .x prop for last input.
 */
function catchPipelineError (fn) {
  return source => {
    const inputs = source.observe().latest();

    // Normalize the outputs.
    const outputs = fn(source)
      .map(x => ({ err: null, x }))
      .errors((err, push) => {
        push(null, { err, x: null });
      });

    return outputs
      // This order ensures we read from inputs whenever there is a new output
      .zip(inputs)
      .map(([ { err, x }, input ]) => {
        if (err) {
          err.x = input;
          throw err;
        }

        return x;
      });
  };
}

This one is more stream focused using zip to relate inputs to outputs of the target pipeline. Should hold up to asynchronous pipelines better but as I said, there's still some unknowns.

/**
 * catchPipelineError
 * Relates incoming stream inputs to output of a pipeline by joining streams
 * Takes a function to transform the stream, just like the through method.
 * Returns a stream of values or emits errors with a .x prop for last input.
 */
function catchPipelineError (fn) {
  const ok = x => ({ ok: true, x });
  const error = x => ({ ok: false, x });
  const isErr = either => either && either.ok === false;

  return source => {
    const inputs = source.observe().latest();

    // Normalize the outputs.
    return source
      .through(fn)
      .map(ok)
      .errors((err, push) => {
        push(null, error(err));
      })
      .zip(inputs)
      .flatMap(([ eithr, input ]) => {
        if (isErr(eithr)) {
          const err = eithr.x;
          err.x = input;

          // Much better than throwing errors in map for instance
          return stream.fromError(err);
        }

        // Return a stream wth a single value
        return stream.of(eithr.x);
      });
  };
}

The big difference here is that we're not throwing errors in a map function but using flatMap to properly map incoming values to either an error stream or a single value stream. This to me is the most robust solution but given the nature there may be some edge cases with async pipelines.

As for using it, pretty much like catchErrors but works with the through function.

console.log('\nUsing a through stream');
const pipeline = (s) => s
  .map((x) => {
    if (x > 2) {
      throw new Error('Too big!');
    }

    return x + 10;
  });

stream([1, 2, 3, 4])
  .through(catchPipelineError(pipeline))
  .errors((err, push) => {
    console.log(`${err.name}: ${err.message} -> ${err.x}`);
  })
  .each(console.log);

Let me know if there's more nuances that this still doesn't cover.

That is the most wonderfully detailed answer I've ever received to any question I've ever posted on the internet. Thank you.

I'll try things out and get back to you!

Just an update -- have been using the

    .tap(x => {
      lastX = x;
    })

approach and it's been doing the job. Thanks again!

Great! I'm glad it's working.