mEstimator Error with Trust Region Optimizers

Question

mEstimator Error with Trust Region Optimizers

DanMcGann opened this issue 2 years ago · comments

Description

When using robust noise models trust region optimizers use an incorrect value for system error when evaluating convergence and step rejection criteria. Specifically these optimizer evaluate convergence on the sum of weighted residuals $\sum(r_i\sqrt{w_i})^2$ rather than the robust cost value $\sum\rho(r_i)$ . Where I have used the same notation style as the robust estimator reference provided in the documentation (LINK) but the exact form computed by GTSAM. Plugging in the definition for the weight we can quickly see the inequality of these terms for many mEstimator forms.

For example Cauchy (c=1 scale value for brevity)
${w_i}{r_i^2}=\frac{\psi(r_i)}{r_i}r_i^2=\psi(r_i)r_i\neq\rho(r_i)$ plugging in form of Cauchy loss we get $\frac{r_i^2}{1+r_i^2}\neq\frac{1}{2}\log(1+r_i^2)$

The discrepancy can greatly affect determination of convergence and step acceptance by our trust region optimizers, especially for the mEstimators that are asymptoticly constant like Geman McClure (see below). For these estimators the system error (in its current implementation) can increase as the estimate gets closer to the mean meaning steps will be rejected by LM or DogLeg and the system will preemptively converge.

Steps to reproduce

Implement factor with a robust noise model
Plot its error wrt its loss.

import gtsam
import numpy as np
import matplotlib.pyplot as plt


def makeValues(x):
    values = gtsam.Values()
    values.insert(0, x)
    return values


def cauchy_loss(x):  # c=1
    return 0.5 * np.log(1 + x ** 2)


def gm_loss(x):
    return (0.5 * x ** 2) / (1 + x ** 2)


# Constants
x = np.arange(-10, 10, 0.1)
noise_model = gtsam.noiseModel.Isotropic.Sigma(1, 1)


# Cauchy
mEst = gtsam.noiseModel.mEstimator.Cauchy(1.0)
robust_model = gtsam.noiseModel.Robust(mEst, noise_model)
factor = gtsam.PriorFactorDouble(0, 0, robust_model)

error = [factor.error(makeValues(xi)) for xi in x]
loss = cauchy_loss(x)
plt.title("Cauchy")
plt.plot(x, error, label="FactorError")
plt.plot(x, loss, label="mEstimator Loss")
plt.legend()
plt.show()


mEst = gtsam.noiseModel.mEstimator.GemanMcClure(1.0)
robust_model = gtsam.noiseModel.Robust(mEst, noise_model)
factor = gtsam.PriorFactorDouble(0, 0, robust_model)

error = [factor.error(makeValues(xi)) for xi in x]
loss = gm_loss(x)
plt.title("Geman McClure")
plt.plot(x, error, label="FactorError")
plt.plot(x, loss, label="mEstimator Loss")
plt.legend()
plt.show()

Expected behavior

I would expect that when evaluating system error optimizers would use the actual robust cost, rather than the value of the weighted squared residual. Parallel, I would expect to see better convergence of trust region optimizers than empirically found. For emperical evaluation of convergence see script below.

Convergence Evaluation Script

import gtsam
from gtsam import noiseModel
import matplotlib.pyplot as plt
import numpy as np

OPTIMIZERS = [
    gtsam.GaussNewtonOptimizer,
    gtsam.LevenbergMarquardtOptimizer,
    gtsam.DoglegOptimizer,
]
OPTIM_PARAMS = [
    gtsam.GaussNewtonParams,
    gtsam.LevenbergMarquardtParams,
    gtsam.DoglegParams,
]
OPTIM_NAMES = ["GaussNewton", "LM", "DOGLEG"]

POSES = [
    gtsam.Point2(0, 0),
    gtsam.Point2(1, 0),
    gtsam.Point2(1, 1),
    gtsam.Point2(0, 1),
]

INITIAL_ESTS = [
    gtsam.Point2(-5, -5),
    gtsam.Point2(6, -5),
    gtsam.Point2(6, 6),
    gtsam.Point2(-5, 6),
]

M_ESTS = {
    "NULL": lambda x: noiseModel.mEstimator.Null(),
    "Huber": noiseModel.mEstimator.Huber,
    "Cauchy": noiseModel.mEstimator.Cauchy,
    "Welsch": noiseModel.mEstimator.Welsch,
    "GemanMcClure": noiseModel.mEstimator.GemanMcClure,
}

RHO_FUNCS = [
    lambda r: 0.5 * r ** 2,
    lambda r: r ** 2 if np.abs(r) < 1 else np.abs(r),
    lambda r: np.log(1 + r ** 2),
    lambda r: (1 - np.exp(-(r ** 2))),
    lambda r: (r ** 2) / (1 + r ** 2),
]

for (mest_name, mest), rho in zip(M_ESTS.items(), RHO_FUNCS):
    graph = gtsam.NonlinearFactorGraph()
    graph.add(
        gtsam.PriorFactorPoint2(
            0,
            POSES[0],
            noiseModel.Robust(mest(1), noiseModel.Isotropic.Sigma(2, 1)),
        )
    )

    for i in range(1, len(POSES)):
        graph.add(
            gtsam.BetweenFactorPoint2(
                i - 1,
                i,
                -POSES[i - 1] + POSES[i],
                noiseModel.Robust(
                    mest(1),
                    noiseModel.Isotropic.Sigma(2, 1),
                ),
            )
        )
    values = gtsam.Values()
    for i, pose in enumerate(INITIAL_ESTS):
        values.insert(i, pose)

    fig = plt.figure(constrained_layout=True)
    ax_dict = fig.subplot_mosaic(
        """
        ABC
        DDD
        """
    )
    print(ax_dict)
    fig.suptitle(mest_name)
    init_pts = np.stack(INITIAL_ESTS)

    gt_pts = np.stack(POSES)
    print(mest_name)
    print(
        "------------------------------------------------------------------------------------------"
    )
    for opt_class, param_class, name, ax_name in zip(
        OPTIMIZERS, OPTIM_PARAMS, OPTIM_NAMES, ["A", "B", "C"]
    ):
        print(name)
        ax = ax_dict[ax_name]
        params = param_class()
        params.setVerbosity("ERROR")
        optimizer = opt_class(graph, values, params)
        result = optimizer.optimize()
        print("")
        pts = []
        for i in range(len(POSES)):
            pts.append(result.atPoint2(i))
        pts = np.stack(pts)
        ax.set_title(name)
        ax.plot(init_pts.T[0], init_pts.T[1], marker="o", label="Initialization")
        ax.plot(gt_pts.T[0], gt_pts.T[1], marker="o", label="GroundTruth")
        ax.plot(pts.T[0], pts.T[1], marker="o", label=name, alpha=0.6)
        ax.set_aspect("equal")

    prior = gtsam.PriorFactorDouble(
        0, 0, gtsam.noiseModel.Robust(mest(1), gtsam.noiseModel.Isotropic.Sigma(1, 1))
    )
    gtsam_vals = []
    expected_vals = []
    x = np.arange(-10, 10, 0.1)
    for xi in x:
        vals = gtsam.Values()
        vals.insert(0, xi)
        gtsam_vals.append(prior.error(vals))
        expected_vals.append(rho(xi))
    ax_dict["D"].set_ylim(0, 3)
    ax_dict["D"].plot(x, gtsam_vals, label="GTSAM Error")
    ax_dict["D"].plot(x, expected_vals, label="Expected Error")
    ax_dict["D"].legend()
    plt.savefig("mest_{}.png".format(mest_name))
plt.show()

Environment

OS: Linux
Language: Python and C++
Version: 4.1.1 (built from source)

Additional information

I was tempted to just submit a PR with a fix, as this is a fairly small (though important bug). However, due to implementation details it is not a trivial fix. It could be fixed, by having two functions in the factor interface one used for system linearization, the other used for evaluating system error. However, this would be redundant for most cases. Alternatively, would be to change the entire implementation scheme for robust estimators, having them wrap factors rather than noiseModels, such that weighting can be applied during linearization, but error is still evaluated using the robust cost. This however, would be a massive interface change. So I opted to just submit a bug report.

Fan Jiang · Answer 1 · Fri Apr 08 2022 01:35:15 GMT+0800 (China Standard Time)

@yetongumich may have some insights here? I am not really sure as last time I looked at robust is about 2 years ago...

yetongumich · Answer 2 · Fri Apr 08 2022 01:49:34 GMT+0800 (China Standard Time)

@yetongumich may have some insights here? I am not really sure as last time I looked at robust is about 2 years ago...

Let's do a quick chat about this tonight?

Fan Jiang · Answer 3 · Fri Apr 08 2022 04:24:08 GMT+0800 (China Standard Time)

Sure

@yetongumich may have some insights here? I am not really sure as last time I looked at robust is about 2 years ago...

Let's do a quick chat about this tonight?

Fan Jiang · Answer 4 · Wed May 04 2022 03:03:31 GMT+0800 (China Standard Time)

I actually stumbled upon this yesterday doing some robust SfM stuff. Yes this is indeed annoying....

Jose Luis Blanco-Claraco · Answer 5 · Wed May 04 2022 05:02:49 GMT+0800 (China Standard Time)

@DanMcGann Thanks for the detailed report and analysis!

It could be fixed, by having two functions in the factor interface one used for system linearization, the other used for evaluating system error. However, this would be redundant for most cases.

You have all the details fresher... would you mind to quickly write down (just as a comment here) what two functions are you thinking of? Then we probably might find a way to make it less redundant via a virtual method or alike...

Daniel McGann · Answer 6 · Wed May 04 2022 05:56:27 GMT+0800 (China Standard Time)

what two functions are you thinking of?

My initial thoughts were to have

A function that returns $\sqrt{w_i}r_i$ that would be used to populate the b vector during linearization.
A function that returns $\rho(r_i)$

However, taking a look at the code again, I cannot figure out how I previously came to the conclusion that this would be necessary.

NoiseModelFactor::linearize computes the unwhitened b vector and later whitens (never makes use of factor.error)
NoiseModelFactor::error already call the noise model loss (for robust this will be \rho)

This fresh look leads me towards the same fix that got proposed in #1161.