Discrepancy between `Reverse` and `ReverseDouble` when using flag `+ffi`
julmb opened this issue · comments
I am running into some very strange behavior with Reverse
and ReverseDouble
. I have found the following example:
import qualified Numeric.AD.Mode.Reverse as R
import qualified Numeric.AD.Mode.Reverse.Double as RD
f :: Fractional a => a -> a
f x = 0.0001 * (sum xs + 1) ^ 2 where
xs = take 4000 $ map (x *) $ cycle [1, 2]
main :: IO ()
main = do
print $ R.diff f (0 :: Double)
print $ RD.diff f (0 :: Double)
When run normally, this gives the following output:
1.1999999999999222
1.1999999999999222
That is, the results of Reverse
and ReverseDouble
match.
However, when the ad
package is built using the +ffi
flag, I get the following output:
1.1999999999999222
0.5855999999999899
Mathematically, the function is equivalent to x -> 3600x² + 1.2x + 0.0001
with derivative x -> 7200x + 1.2
. Thus, the result of Reverse
is close enough, but ReverseDouble
with +ffi
is way off. In my actual project, the effect was more subtle than in this example, but large enough that an optimization algorithm no longer found the optimum.
The result of ReverseDouble
using +ffi
seems fairly sensitive to the number supplied to take
in the example.
I am using ad-4.5
and ghc-8.10.7
.
I did some more research into this today. This issue seems to only appear when the +ffi
tape consists of multiple blocks. Since the first block is initialized generously to have 4096 entries, the problem only becomes apparent if the function being differentiated consists of many operations (hence the contrived take/map/cycle
example). By changing the size of the initial block to 1
, I was able to reproduce the issue with functions as simple as x -> (x + 1) ^ 2
.
ad/src/Numeric/AD/Internal/Reverse/Double.hs
Line 102 in 5f8a908
After some more poking around, it seems to be an issue with this block of code:
Lines 82 to 109 in 5f8a908
The variable idx
is decremented at the beginning of the loop in line 88. This is compensated for in line 82, before the start of the outer loop, but not for subsequent passes of the outer loop. Replacing line 107 with idx += pTape->offset + 1;
has fixed all the issues that I have observed.
However, I am neither conceptually familiar with the reverse AD tape, nor with this implementation in particular, so I am not confident that this is in fact correct. It would be nice if someone familiar with the code could have a look at this suggestion.
As an aside, while troubleshooting this issue, I have also noticed that the +ffi
tape uses different variable indices than the Cell
-based implementation in Haskell. However, this seems to not cause any issues, so maybe they are different but both internally consistent indexing schemes. Not sure if this is worth looking into.
Good catch. @sofusmortensen, perhaps you have some insight into this issue?
Fixed in #99.