ekmett / ad

Automatic Differentiation

Home Page:http://hackage.haskell.org/package/ad

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Discrepancy between `Reverse` and `ReverseDouble` when using flag `+ffi`

julmb opened this issue · comments

I am running into some very strange behavior with Reverse and ReverseDouble. I have found the following example:

import qualified Numeric.AD.Mode.Reverse as R
import qualified Numeric.AD.Mode.Reverse.Double as RD

f :: Fractional a => a -> a
f x = 0.0001 * (sum xs + 1) ^ 2 where
    xs = take 4000 $ map (x *) $ cycle [1, 2]

main :: IO ()
main = do
    print $ R.diff f (0 :: Double)
    print $ RD.diff f (0 :: Double)

When run normally, this gives the following output:

1.1999999999999222
1.1999999999999222

That is, the results of Reverse and ReverseDouble match.

However, when the ad package is built using the +ffi flag, I get the following output:

1.1999999999999222
0.5855999999999899

Mathematically, the function is equivalent to x -> 3600x² + 1.2x + 0.0001 with derivative x -> 7200x + 1.2. Thus, the result of Reverse is close enough, but ReverseDouble with +ffi is way off. In my actual project, the effect was more subtle than in this example, but large enough that an optimization algorithm no longer found the optimum.

The result of ReverseDouble using +ffi seems fairly sensitive to the number supplied to take in the example.

I am using ad-4.5 and ghc-8.10.7.

I did some more research into this today. This issue seems to only appear when the +ffi tape consists of multiple blocks. Since the first block is initialized generously to have 4096 entries, the problem only becomes apparent if the function being differentiated consists of many operations (hence the contrived take/map/cycle example). By changing the size of the initial block to 1, I was able to reproduce the issue with functions as simple as x -> (x + 1) ^ 2.

p <- c_tape_alloc (fromIntegral vs) (4 * 1024)

After some more poking around, it seems to be an issue with this block of code:

ad/cbits/tape.c

Lines 82 to 109 in 5f8a908

int idx = 1 + start;
while (pTape)
{
idx -= pTape->offset;
while (--idx >= 0)
{
double v = buffer[idx + pTape->offset];
if (v == 0.0) continue;
int i = pTape->lnk[idx*2];
double x = pTape->val[idx*2];
if (x != 0.0)
{
buffer[i] += v*x;
}
int j = pTape->lnk[idx*2 + 1];
double y = pTape->val[idx*2 + 1];
if (y != 0.0)
{
buffer[j] += v*y;
}
}
idx += pTape->offset;
pTape = pTape->prev;
}

The variable idx is decremented at the beginning of the loop in line 88. This is compensated for in line 82, before the start of the outer loop, but not for subsequent passes of the outer loop. Replacing line 107 with idx += pTape->offset + 1; has fixed all the issues that I have observed.

However, I am neither conceptually familiar with the reverse AD tape, nor with this implementation in particular, so I am not confident that this is in fact correct. It would be nice if someone familiar with the code could have a look at this suggestion.

As an aside, while troubleshooting this issue, I have also noticed that the +ffi tape uses different variable indices than the Cell-based implementation in Haskell. However, this seems to not cause any issues, so maybe they are different but both internally consistent indexing schemes. Not sure if this is worth looking into.

Good catch. @sofusmortensen, perhaps you have some insight into this issue?

Fixed in #99.