borglab / gtsam

GTSAM is a library of C++ classes that implement smoothing and mapping (SAM) in robotics and vision, using factor graphs and Bayes networks as the underlying computing paradigm rather than sparse matrices.

Home Page:http://gtsam.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

mEstimator Error with Trust Region Optimizers

DanMcGann opened this issue · comments

Description

When using robust noise models trust region optimizers use an incorrect value for system error when evaluating convergence and step rejection criteria. Specifically these optimizer evaluate convergence on the sum of weighted residuals formula rather than the robust cost value formula. Where I have used the same notation style as the robust estimator reference provided in the documentation (LINK) but the exact form computed by GTSAM. Plugging in the definition for the weight we can quickly see the inequality of these terms for many mEstimator forms.

For example Cauchy (c=1 scale value for brevity)
formula plugging in form of Cauchy loss we get formula

The discrepancy can greatly affect determination of convergence and step acceptance by our trust region optimizers, especially for the mEstimators that are asymptoticly constant like Geman McClure (see below). For these estimators the system error (in its current implementation) can increase as the estimate gets closer to the mean meaning steps will be rejected by LM or DogLeg and the system will preemptively converge.

Steps to reproduce

  1. Implement factor with a robust noise model
  2. Plot its error wrt its loss.
import gtsam
import numpy as np
import matplotlib.pyplot as plt


def makeValues(x):
    values = gtsam.Values()
    values.insert(0, x)
    return values


def cauchy_loss(x):  # c=1
    return 0.5 * np.log(1 + x ** 2)


def gm_loss(x):
    return (0.5 * x ** 2) / (1 + x ** 2)


# Constants
x = np.arange(-10, 10, 0.1)
noise_model = gtsam.noiseModel.Isotropic.Sigma(1, 1)


# Cauchy
mEst = gtsam.noiseModel.mEstimator.Cauchy(1.0)
robust_model = gtsam.noiseModel.Robust(mEst, noise_model)
factor = gtsam.PriorFactorDouble(0, 0, robust_model)

error = [factor.error(makeValues(xi)) for xi in x]
loss = cauchy_loss(x)
plt.title("Cauchy")
plt.plot(x, error, label="FactorError")
plt.plot(x, loss, label="mEstimator Loss")
plt.legend()
plt.show()


mEst = gtsam.noiseModel.mEstimator.GemanMcClure(1.0)
robust_model = gtsam.noiseModel.Robust(mEst, noise_model)
factor = gtsam.PriorFactorDouble(0, 0, robust_model)

error = [factor.error(makeValues(xi)) for xi in x]
loss = gm_loss(x)
plt.title("Geman McClure")
plt.plot(x, error, label="FactorError")
plt.plot(x, loss, label="mEstimator Loss")
plt.legend()
plt.show()

bug_report_cauchy
bug_report_gm

Expected behavior

I would expect that when evaluating system error optimizers would use the actual robust cost, rather than the value of the weighted squared residual. Parallel, I would expect to see better convergence of trust region optimizers than empirically found. For emperical evaluation of convergence see script below.

Convergence Evaluation Script

import gtsam
from gtsam import noiseModel
import matplotlib.pyplot as plt
import numpy as np

OPTIMIZERS = [
    gtsam.GaussNewtonOptimizer,
    gtsam.LevenbergMarquardtOptimizer,
    gtsam.DoglegOptimizer,
]
OPTIM_PARAMS = [
    gtsam.GaussNewtonParams,
    gtsam.LevenbergMarquardtParams,
    gtsam.DoglegParams,
]
OPTIM_NAMES = ["GaussNewton", "LM", "DOGLEG"]

POSES = [
    gtsam.Point2(0, 0),
    gtsam.Point2(1, 0),
    gtsam.Point2(1, 1),
    gtsam.Point2(0, 1),
]

INITIAL_ESTS = [
    gtsam.Point2(-5, -5),
    gtsam.Point2(6, -5),
    gtsam.Point2(6, 6),
    gtsam.Point2(-5, 6),
]

M_ESTS = {
    "NULL": lambda x: noiseModel.mEstimator.Null(),
    "Huber": noiseModel.mEstimator.Huber,
    "Cauchy": noiseModel.mEstimator.Cauchy,
    "Welsch": noiseModel.mEstimator.Welsch,
    "GemanMcClure": noiseModel.mEstimator.GemanMcClure,
}

RHO_FUNCS = [
    lambda r: 0.5 * r ** 2,
    lambda r: r ** 2 if np.abs(r) < 1 else np.abs(r),
    lambda r: np.log(1 + r ** 2),
    lambda r: (1 - np.exp(-(r ** 2))),
    lambda r: (r ** 2) / (1 + r ** 2),
]

for (mest_name, mest), rho in zip(M_ESTS.items(), RHO_FUNCS):
    graph = gtsam.NonlinearFactorGraph()
    graph.add(
        gtsam.PriorFactorPoint2(
            0,
            POSES[0],
            noiseModel.Robust(mest(1), noiseModel.Isotropic.Sigma(2, 1)),
        )
    )

    for i in range(1, len(POSES)):
        graph.add(
            gtsam.BetweenFactorPoint2(
                i - 1,
                i,
                -POSES[i - 1] + POSES[i],
                noiseModel.Robust(
                    mest(1),
                    noiseModel.Isotropic.Sigma(2, 1),
                ),
            )
        )
    values = gtsam.Values()
    for i, pose in enumerate(INITIAL_ESTS):
        values.insert(i, pose)

    fig = plt.figure(constrained_layout=True)
    ax_dict = fig.subplot_mosaic(
        """
        ABC
        DDD
        """
    )
    print(ax_dict)
    fig.suptitle(mest_name)
    init_pts = np.stack(INITIAL_ESTS)

    gt_pts = np.stack(POSES)
    print(mest_name)
    print(
        "------------------------------------------------------------------------------------------"
    )
    for opt_class, param_class, name, ax_name in zip(
        OPTIMIZERS, OPTIM_PARAMS, OPTIM_NAMES, ["A", "B", "C"]
    ):
        print(name)
        ax = ax_dict[ax_name]
        params = param_class()
        params.setVerbosity("ERROR")
        optimizer = opt_class(graph, values, params)
        result = optimizer.optimize()
        print("")
        pts = []
        for i in range(len(POSES)):
            pts.append(result.atPoint2(i))
        pts = np.stack(pts)
        ax.set_title(name)
        ax.plot(init_pts.T[0], init_pts.T[1], marker="o", label="Initialization")
        ax.plot(gt_pts.T[0], gt_pts.T[1], marker="o", label="GroundTruth")
        ax.plot(pts.T[0], pts.T[1], marker="o", label=name, alpha=0.6)
        ax.set_aspect("equal")

    prior = gtsam.PriorFactorDouble(
        0, 0, gtsam.noiseModel.Robust(mest(1), gtsam.noiseModel.Isotropic.Sigma(1, 1))
    )
    gtsam_vals = []
    expected_vals = []
    x = np.arange(-10, 10, 0.1)
    for xi in x:
        vals = gtsam.Values()
        vals.insert(0, xi)
        gtsam_vals.append(prior.error(vals))
        expected_vals.append(rho(xi))
    ax_dict["D"].set_ylim(0, 3)
    ax_dict["D"].plot(x, gtsam_vals, label="GTSAM Error")
    ax_dict["D"].plot(x, expected_vals, label="Expected Error")
    ax_dict["D"].legend()
    plt.savefig("mest_{}.png".format(mest_name))
plt.show()

Environment

  • OS: Linux
  • Language: Python and C++
  • Version: 4.1.1 (built from source)

Additional information

I was tempted to just submit a PR with a fix, as this is a fairly small (though important bug). However, due to implementation details it is not a trivial fix. It could be fixed, by having two functions in the factor interface one used for system linearization, the other used for evaluating system error. However, this would be redundant for most cases. Alternatively, would be to change the entire implementation scheme for robust estimators, having them wrap factors rather than noiseModels, such that weighting can be applied during linearization, but error is still evaluated using the robust cost. This however, would be a massive interface change. So I opted to just submit a bug report.

@yetongumich may have some insights here? I am not really sure as last time I looked at robust is about 2 years ago...

@yetongumich may have some insights here? I am not really sure as last time I looked at robust is about 2 years ago...

Let's do a quick chat about this tonight?

Sure

@yetongumich may have some insights here? I am not really sure as last time I looked at robust is about 2 years ago...

Let's do a quick chat about this tonight?

I actually stumbled upon this yesterday doing some robust SfM stuff. Yes this is indeed annoying....

@DanMcGann Thanks for the detailed report and analysis!

It could be fixed, by having two functions in the factor interface one used for system linearization, the other used for evaluating system error. However, this would be redundant for most cases.

You have all the details fresher... would you mind to quickly write down (just as a comment here) what two functions are you thinking of? Then we probably might find a way to make it less redundant via a virtual method or alike...

what two functions are you thinking of?

My initial thoughts were to have

  1. A function that returns formula that would be used to populate the b vector during linearization.
  2. A function that returns formula

However, taking a look at the code again, I cannot figure out how I previously came to the conclusion that this would be necessary.

  • NoiseModelFactor::linearize computes the unwhitened b vector and later whitens (never makes use of factor.error)
  • NoiseModelFactor::error already call the noise model loss (for robust this will be \rho)

This fresh look leads me towards the same fix that got proposed in #1161.