autometrics-dev / autometrics-ts

Easily add metrics to your system – and actually understand them using automatically customized Prometheus queries

Home Page:https://autometrics.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support custom `errorIf` and `okIf` callbacks

keturiosakys opened this issue · comments

Currently, the only way a function will register an error in Autometrics is if it throws it. This limits the usefulness of the library in contexts such as route handlers, where any further calls would be wrapped in a trycatch and likely handled. So as an example, a case like this...

export async function getUsers(req: express.Request, res: express.Response) {
  try {
    const users = await db.getUsers();
    return res.status(200).json(users);
  } catch (e) {
    return res.status(500).send(e.message);
  }
}

app.get("/users", autometrics(getUsers));

...would always register result="ok" in metrics because it never throws even though the actual result is an error.

Proposal

Similar to autometrics-dev/autometrics-rs#32, we should add ways for users to tweak, what is an error and what isn't. As we don't have the trait interfaces in TS/JS, one way we can solve this is by exposing errorIf and okIf interfaces in the AutometricsOptions block, that users would be able to supply conditions to.

Here's how the above example would change that would correctly register an error then:

const errorIf = async (res: Promise<express.Response>) => {
  return (await res).statusCode >= 400;
};

app.get("/users", autometrics({ errorIf }, handleGetUsers));

Could it be an option to have an autometrics.reportFailure() function that can simply be called when needed?

That way, it could become:

export async function getUsers(req: express.Request, res: express.Response) {
  try {
    const users = await db.getUsers();
    return res.status(200).json(users);
  } catch (e) {
    autometrics.reportFailure(e);
    return res.status(500).send(e.message);
  }
}
  } catch (e) {
    autometrics.reportFailure(e);
    return res.status(500).send(e.message);
  }

@arendjr At first glance this does seem more ergonomic.

I'm trying to wrap my head around this problem though: how would the autometrics wrapper know that autometrics.reportFailure fired inside the wrapped function?

@arendjr interesting idea. I might be thinking too much about implementation details, but how do we handle things like if the function isn't decorated with autometrics? And also, how would we ensure that the reportFailure is done for the right wrapped function?

You would need a hidden global somewhere in your package, but with it, you could do something like this:

let globalOutOfBoundFailure: Error | null = null;

function reportFailure(error: Error) {
  globalOutOfBoundFailure = error;
}

function instrumentedFunction() {
  try {
    globalOutOfBoundFailure = null;

    const result = fn(...params);

    if (globalOutOfBoundFailure) {
      throw globalOutOfBoundFailure;
    }

    if (isPromise<ReturnType<F>>(result)) {
      return result
        .then((res: Awaited<ReturnType<typeof result>>) => {
          if (globalOutOfBoundFailure) {
            throw globalOutOfBoundFailure;
          }
        
          onSuccess();
          return res;
        })
        .catch((err: unknown) => {
          globalOutOfBoundFailure = null;
          onError();
          throw err;
        });
    }

    onSuccess();
    return result;
  } catch (error) {
    globalOutOfBoundFailure = null;
    onError();
    throw error;
  }
}

But it is true indeed you cannot prevent people from calling it from outside Autometrics-annotated functions, in which case errors reported this way "bubble up" to the nearest function that is annotated, so it requires assuming some hygiene from your users :)

Another gotcha I should mention is that if people call other Autometrics-annotated functions between calling reportFailure() and returning from the annotated function, that failure will be erased, unless you implement a hidden stack mechanism, which I could understand if you'd rather not :)

Yeah that's interesting. One way we could probably mitigate some of the issues is using the AsyncLocalStorage API for a "thread-local" variable of sorts instead of a global (something we use now to pass along the caller label information). However, calling autometrics-wrapped functions inside the scope might still reset the value.

I'm a bit worried about the edge cases with the reportFailure api. Autometrics (in my opinion) should not report things incorrectly if the people use the API in a bit of a funky way.

That said, reportFailure covers the issue of reporting errors in case when there's no exception. However there are also cases where errors/exceptions don't need to be reported as errors. For instance: in the past I've worked on an API that used boom so you could throw errors in functions that were more or less low level and add some information on what kind of status code this should result in. So say there's an endpoint called updateProjectSettings which would call the getProject function. If a project didn't exist the "low level function" getProject would reject with a boom error specifying a 404. In case of autometrics i can imagine people wanting to "catch" these errors on the getProject function and prevent them from being reported as errors.

An example of the endpoints code base could look something like:

const updateProjectSettings = (req, res) => 
  Promise.all([validateSettings(req), getProject(req.params["projectId"])])
      .then(([settings]) => updateProject(settings))
      .then(updatedProject => serializeProject(updatedProject))
      .catch(error => sendErrorResponse(res, error));

the reportFailure approach seems more flexible for where/how to define errors, but relying on end user code hygiene seems a little dangerous. the errorIf api is a bit less flexible, but it only concerns itself with the output of the function. it keeps autometrics' side-effects outside of the body of the instrumented function, which i like.

to jacco's point this only covers a custom definition for an error when the function would've otherwise recorded a result="ok". it doesn't handle the case where you want to record result="ok" for a (presumably expected) error. i'm presuming, though, that we could just implement something like okIf, which would only be called in the event that the function throws an error?

i'd be okay with errorIf but might prefer a slightly more descriptive name. (but i can't come up with anything that isn't overly descriptive...)

Yeah I think I agree with @brettimus and @flenter - part of the appeal of the Autometrics is that you don't need to change the body of the function itself to register metrics side-effects.

In terms of naming, we can change the errorIf, okIf to something like reportErrorIf, reportOkIf (so it's clear we mean reporting here, as opposed to actually throwing an error). I'd be inclined then to upstream that to the Rust version then as well to keep it consistent.

I'd need to think about how we can implement reportOkIf but it should be quite simple. Currently we rethrow any errors the wrapper catches and register an error in the metrics. With reportOkIf we'd add a callback that inspects the error value and if it matches the user-defined condition, we'd still rethrow the error but report OK metrics.