TMBad documentation?

Question

TMBad documentation?

bbolker opened this issue 2 years ago · comments

Looking a gift horse in the mouth ...
Is there somewhere/could there be somewhere that documents/describes the differences between the TMBad and CppAD engines/how TMBad fits in?

Also, I think there may be something wrong with the book generation code: lots of weird HTML tags showing up ...

kaskr · Answer 1 · Thu Feb 02 2023 18:05:16 GMT+0800 (China Standard Time)

You're right, thanks - the book looks awful... It's probably something with the versions of doxygen, knitr, bookdown...

Regarding TMBad/CppAD comparison there's nothing publicly available yet except the TMBad doxygen documentation.

A copy-paste from my private working document gives an idea of the similarities/differences:

TMBad overview

Recursive higher order graphs without using nested AD types. This allow us to go to any order without knowing the order at compile time.
Pointer increment/decrement is inlined into all operators. This reduces the amount of function pointer calls compared to CppAD.
In contrast to CppAD there is no 'operator enum' and therefore no need for a giant operator switch. We implement operators as classes with virtual methods. There's a memory penalty associated with this decision: CppAD can represent an operator using just 8 bits. We need 64 bits.
There is no limitation on the the number of operators except for memory and compilation time (unlike CppAD).
There is no need for a special 'atomic function' with reduced performance. The user is allowed to implement native operators that access value and derivatives without any copy overhead.
TMBad can optionally reduce memory while taping by compressing operators and inputs: When a new operator is added to the stack it will be merged (if possible) with the most recently added operator. If the merge is succesful the operator stack size will be reduced. Similar compression techniques are applied to the operator inputs (if possible). The compression currently has to be enabled using FUSE preprocessor flag. Otherwise it is dissabled. Operator compression not only reduces memory. It also makes the forward and reverse sweeps run faster because the fused operators can be better optimized by the compiler. The downside of on-the-fly compression is that it adds a small overhead while taping.
Subgraph sweeps are implemented for both forward and reverse mode.
Storage of values and derivatives follows the 'struct of arrays' (SOA) scheme, used by CppAD, rather than the array of structs (AOS) scheme used e.g. by Stan. It is expected that AOS is faster when taping because an allocation of a new operator along with all inputs and outputs is essentially a single pointer increment, and thus AOS is probably a superior choice when the operation stack has to be rebuilt frequently. However, for typical TMB applications the operation stack is constant, thus the stack allocation time is not really an issue. The SOA storage on the other hand provides better memory access for vectorized instructions and BLAS operations which is more important for TMB.
Operators can take references as inputs. E.g. we support matrix functions with pointer arguments. This is possible when we can guarantee that a matrix is stored consecutively on the tape (not possible with CppAD).
In constrast to CppAD we do not by default allow parameter dependent comparision, e.g. $a<0$ where $a$ is a parameter. Attempts to do so gives a compile time error. An extension class ad_adapt of the ad type enables comparison operators. Tapes generated using this type will automatically retape each new function evaluation. One can use both the normal ad type and the adaptive ad type within the same tape. The resulting tape will be hybrid in the sense that some parts of it automatically retapes while other parts are constant.
Nested AD contexts: New AD contexts can be started while other AD contexts are still running. Variables in a 'child context' can refer to variables in the 'parent context'. Done
Dynamic operators: On the fly checkpointing with 'clean' memory management (operators are owned and managed by the tape). Done
Automatic split computational graph in sparse plus low-rank contribution. Done
Expression tree hash codes to remove identical sub expressions (like CppAD). Done
An update to the sparse hessian calculation using 'on the fly atomics': If a hessian column is cheap use the normal sub-sweep. Otherwise call an atomic jacobian row function. Memory saving can be huge. Done
Computational graph reordering.
Introduce the 'tail sweep' (kind of lazy evaluation...): Re-order the computational graph such that x -> f(x) -> u -> f(x,u) i.e. first comes fixed effects dependent sub expressions and then comes all the rest. When forward evaluating for given input (x,u) we loop through the joint vector from the left and find the first changed component. We keep a table that tells us which node to start from. Done
Complex sweeps (needed for profile likelihood using penalties). Partially Done
Computational graph transformations that integrates independent variables out of the graph: Done
- Gauss Kronrod quadrature
- Sequential reduction
- Laplace approximation: Newton operator implementation allows (adaptive) Laplace approximation to be autogenerated and put on the tape. Performance is currently very close to the old TMB implementation.
Source code generator Done
- Includes new algorithms to compress the graph before compilation.
- Perspective: Running Laplace approximation on GPUs

Ben Bolker · Answer 2 · Fri Feb 03 2023 05:14:16 GMT+0800 (China Standard Time)

This is great, thanks! As long as I'm asking, it would be interesting if there were a short (one-two sentence) of the overall vision/scope for TMBad; is this essentially a complete replacement for/alternative to CppAD? (It's very hard to keep up with the autodiff landscape if one is not an expert; https://www.autodiff.org is a good place to start, but I have no idea if it's kept up to date (the bibliography has lots up to 2021, but only one entry for 2022; it only includes one paper by you Bell and Kristensen 2018)).

kaskr · Answer 3 · Fri Feb 03 2023 15:55:18 GMT+0800 (China Standard Time)

Yes, TMBad is a complete replacement of the subset of CppAD used by TMB. Its purpose is to make TMB faster, simpler and more memory efficient. A standalone version of TMBad exists in a (currently) private repository which probably will become public in the future.

Ben Bolker · Answer 4 · Fri Feb 03 2023 21:26:53 GMT+0800 (China Standard Time)

I'm closing this (thanks!), but will post a new issue about the book formatting so it stays on the list.