A simple method to estimate uncertainty in Machine Learning
When trying to predict and output it is some times useful to also get a confidence score or similarly a range of values around this expected value in which the true value might be found. Practical examples of this include estimating upper and lower bound when predicting a time of arrival (ETA) or a stock price since you not only care about an expected value but also about the best case and worst case scenarios when trying to minimize risk.
While most Machine Learning techniques don't provide a natural way of doing this, in this article we will be exploring Quantile Regression as a means of doing so, this technique will allow us to learn some very important statistical properties of our data: the quantiles.
To begin our journey into quantile regression we will first get hold on some data:
Show code
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams["figure.dpi"] = 300
plt.rcParams["figure.facecolor"] = "white"
np.random.seed(69)
def create_data(multimodal: bool):
x = np.random.uniform(0.3, 10, 1000)
y = np.log(x) + np.random.exponential(0.1 + x / 20.0)
if multimodal:
x = np.concatenate([x, np.random.uniform(5, 10, 500)])
y = np.concatenate([y, np.random.normal(6.0, 0.3, 500)])
return x[..., None], y[..., None]
multimodal: bool = False
x, y = create_data(multimodal)
plt.scatter(x[..., 0], y[..., 0], s=20, facecolors="none", edgecolors="k")
plt.show()
Here we have a simple 2D dataset, however notice that y
has some very peculiar statistical properties:
- It is not normally distributed, infact it is exponentially distributed.
- The previous also means its noise it not symetric.
- Its variance is not constant, it increases as
x
increases.
When making prediction for this kind of data we might be very interested to know what range of values our data revolves around such that we can judge if a specific outcome is expected or not, what are the best and worst case scenarios, etc.
The only thing special about quantile regression really is its loss function, instead of the usual MAE or MSE losses for quantile regression we use the following function:
Here
First lets notice that this formula can be rewritten as follows:
Using
import jax
import jax.numpy as jnp
def quantile_loss(q, y_true, y_pred):
e = y_true - y_pred
return jnp.maximum(q * e, (q - 1.0) * e)
Now that we have this function lets explore the error landscape for a particular set of predictions. Here we will generate values for y_true
in the range y_pred
could take. Ideally we want to find the the value of y_pred
where the error is the smallest.
Show code
def calculate_error(q):
y_true = np.linspace(10, 20, 100)
y_pred = np.linspace(10, 20, 200)
loss = jax.vmap(quantile_loss, in_axes=(None, None, 0))(q, y_true, y_pred)
loss = loss.mean(axis=1)
return y_true, y_pred, loss
q = 0.8
y_true, y_pred, loss = calculate_error(q)
q_true = np.quantile(y_true, q)
plt.plot(y_pred, loss)
plt.vlines(q_true, 0, loss.max(), linestyles="dashed", colors="k")
plt.gca().set_xlabel("y_pred")
plt.gca().set_ylabel("loss")
plt.title(f"Q({q:.2f}) = {q_true:.1f}")
plt.show()
If we plot the error what we see is that the minumum of value of the quantile loss is exactly at the value of the $q$th quantile. It achieves this because the quantile loss is not symetrical, for quantiles above 0.5
it penalizes positive errors stronger than negative errors, and the opposite is true for quantiles below 0.5
. In particular, quantile 0.5
is the median and its formula is equivalent to the MAE.
Generally you would have to create a model per quantile, however if we use a neural network we can have it output the predictions for all the quantiles at the same time. Here will use elegy
to create a neural network with 2 hidden layers with relu
activations and a linear layers with n_quantiles
output units.
import elegy
class QuantileRegression(elegy.Module):
def __init__(self, n_quantiles: int):
super().__init__()
self.n_quantiles = n_quantiles
def call(self, x):
x = elegy.nn.Linear(128)(x)
x = jax.nn.relu(x)
x = elegy.nn.Linear(64)(x)
x = jax.nn.relu(x)
x = elegy.nn.Linear(self.n_quantiles)(x)
return x
Now we are going to properly define a QuantileLoss
class that is parameterized by
a set of user defined quantiles
.
class QuantileLoss(elegy.Loss):
def __init__(self, quantiles):
super().__init__()
self.quantiles = np.array(quantiles)
def call(self, y_true, y_pred):
loss = jax.vmap(quantile_loss, in_axes=(0, None, -1), out_axes=1)(
self.quantiles, y_true[:, 0], y_pred
)
return jnp.sum(loss, axis=-1)
Notice that we use the same quantile_loss
that we created previously along with some jax.vmap
magic to properly vectorize the function. Finally we are going to create a simple function that creates and trains our model for a set of quantiles using elegy
.
Show code
import optax
def train_model(quantiles, epochs: int, lr: float, eager: bool):
model = elegy.Model(
QuantileRegression(n_quantiles=len(quantiles)),
loss=QuantileLoss(quantiles),
optimizer=optax.adamw(lr),
run_eagerly=eager,
)
model.init(x, y)
model.summary(x)
model.fit(x, y, epochs=epochs, batch_size=64, verbose=0)
return model
if not multimodal:
quantiles = (0.05, 0.1, 0.3, 0.5, 0.7, 0.9, 0.95)
else:
quantiles = np.linspace(0.05, 0.95, 9)
model = train_model(quantiles=quantiles, epochs=3001, lr=1e-4, eager=False)
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer ┃ Outputs Shape ┃ Trainable ┃ Non-trainable ┃ ┃ ┃ ┃ Parameters ┃ Parameters ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ Inputs │ (1000, 1) float64 │ │ │ ├──────────────────────────────┼──────────────────────┼──────────────────┼───────────────┤ │ linear Linear │ (1000, 128) float32 │ 256 1.0 KB │ │ ├──────────────────────────────┼──────────────────────┼──────────────────┼───────────────┤ │ linear_1 Linear │ (1000, 64) float32 │ 8,256 33.0 KB │ │ ├──────────────────────────────┼──────────────────────┼──────────────────┼───────────────┤ │ linear_2 Linear │ (1000, 7) float32 │ 455 1.8 KB │ │ ├──────────────────────────────┼──────────────────────┼──────────────────┼───────────────┤ │ * QuantileRegression │ (1000, 7) float32 │ │ │ ├──────────────────────────────┼──────────────────────┼──────────────────┼───────────────┤ │ │ Total │ 8,967 35.9 KB │ │ └──────────────────────────────┴──────────────────────┴──────────────────┴───────────────┘ Total Parameters: 8,967 35.9 KB
Now that we have a model lets generate some test data that spans the entire domain and compute the predicted quantiles.
Show code
x_test = np.linspace(x.min(), x.max(), 100)
y_pred = model.predict(x_test[..., None])
plt.scatter(x, y, s=20, facecolors="none", edgecolors="k")
for i, q_values in enumerate(np.split(y_pred, len(quantiles), axis=-1)):
plt.plot(x_test, q_values[:, 0], linewidth=2, label=f"Q({quantiles[i]:.2f})")
plt.legend()
plt.show()
Amazing! Notice how the first few quantiles are tightly packed together while the last ones spread out capturing the behavior of the exponential distribution. We can also visualize region between the highest and lowest quantiles, this gives use some bounds on our predictions.
Show code
median_idx = np.where(np.isclose(quantiles, 0.5))[0]
plt.fill_between(x_test, y_pred[:, -1], y_pred[:, 0], alpha=0.5, color="b")
plt.scatter(x, y, s=20, facecolors="none", edgecolors="k")
plt.plot(
x_test,
y_pred[:, median_idx],
color="r",
linestyle="dashed",
label="median",
)
plt.legend()
plt.show()
On the other hand, having multiple quantile values allows you to estimate the density of the data, since the difference between two adjacent quantiles represent the probability that a point lies between them, we can construct a piecewise function that approximates the density of the data.
On the other hand, having multiple quantile values allows you to estimate the density of the data, since the difference between two adjacent quantiles represent the probability that a point lies between them, we can construct a piecewise function that approximates the density of the data.
Show code
def get_pdf(quantiles, q_values):
densities = []
for i in range(len(quantiles) - 1):
area = quantiles[i + 1] - quantiles[i]
b = q_values[i + 1] - q_values[i]
a = area / b
densities.append(a)
return densities
def piecewise(xs):
return [xs[i + j] for i in range(len(xs) - 1) for j in range(2)]
def doubled(xs):
return [np.clip(xs[i], 0, 3) for i in range(len(xs)) for _ in range(2)]
Now for a given x
we can compute the quantile values and then use these to compute the conditional piecewise density function of y
given x
.
Show code
xi = 7.0
q_values = model.predict(np.array([[xi]]))[0].tolist()
densities = get_pdf(quantiles, q_values)
plt.title(f"x = {xi}")
plt.fill_between(piecewise(q_values), 0, doubled(densities))
# plt.fill_between(q_values, 0, densities + [0])
# plt.plot(q_values, densities + [0], color="k")
plt.xlim(0, y.max())
plt.gca().set_xlabel("y")
plt.gca().set_ylabel("p(y)")
plt.show()
One of the nice properties of Quantile Regression is that we did not need to know a priori the output distribution and training is easy in comparison to other methods.
- Quantile Regression is a simple and effective method for learning some statistics about the output distribution.
- It is specially useful to stablish bounds on the predictions of a model when risk management is desired.
- The Quantile Loss function is simple and easy to implement.
- Quantile Regression can be efficiently implemented in using Neural Networks since a single model can be used to predict all the quantiles.
- The quantiles can be used to estimate the conditional density of the data.