idris-lang / Idris-dev

A Dependently Typed Functional Programming Language

Home Page:http://idris-lang.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The cost of computing Nat equality proofs at type check time

nicolabotta opened this issue · comments

The program

> %default total
> %auto_implicits off
> n : Nat
> n = 80000
> %freeze n
> ns : List Nat
> ns = [1..n]
> postulate lemma : (m : Nat) -> 2 * (sum [1..m]) = m * (S m)
> q : 2 * (sum ns) = n * (S n)
> q = lemma n
> main : IO ()
> main = do putStrLn ("n           = " ++ show n)
>           putStrLn ("sum ns      = " ++ show (sum ns))

type checks in constant time and executes in linear time in n as one would expect:

    n   | type check |  run
  ======|============|======= 
  10000 |       5.7s | 0.006s
  20000 |       5.8s | 0.012s
  40000 |       6.0s | 0.028s
  80000 |       5.8s | 0.060s

However, commenting out

> %freeze n

make the type checker go nuts. Now it takes significantly more than quadratic time in n for the program to check even for small values of n:

    n   | type check |  run
  ======|============|======= 
   10   |       6.0s | 0.002s
   20   |       5.7s | 0.002s
   40   |       7.6s | 0.002s
   80   |      29.7s | 0.002s
  160   |   11m55.0s | 0.002s
  320   |  243m22.0s | 0.002s
  640   |   killed!  | 

Even worse, type-checking aborts for n >= 640! From a user's perspective, this behavior is logically puzzling and, in practice, unacceptable:

The original program shows that, under the assumption expressed by lemma, q holds for every n : Nat. But computing q for a specific value of n does fail! This seems inconsistent.

In the above example, freezing n allows one to type check the program in constant time. But in all but trivial applications, finding out which variables have to be freezed to recover the expected type checking behavior can be a nightmare.

The above program tries to reproduce the observations made in #3509, #3405, #3358, #3246 and #3246 in a simple, self-contained example.

@nicolabotta Thank you for the issue! It's great to have a minimal example.

Would you say that this issue supersedes the previous ones (i.e. we can close them)? I think it would be better if we had a single issue that described the problem and it's easier to track.

@ahmadsalim: Of course I hope that solving this issue will make it possible to close the other issues as well but at this point this is just a hope. Perhaps we have to do with more than one problem. I suggest to keep the other issues open for the time being. As soon as we have a solution that works well on the example above, I will check it on the other issues and hopefully close them.

@nicolabotta OK, as you wish. Let us keep the issue open for the time being.

This is exactly the same issue as any others that involve evaluation for type checking - it needs to evaluate n to make any progress, and at the moment evaluation is all-or-nothing.

The logical explanation is that Nats are unary - there is little point in hard coding things for Nat though: firstly, because that would be a huge change that would impact every part of the system, and secondly because solving it for Nat would do nothing for the other programs which do lots of computation at compile time that would exhibit the same kind of problem.

I do get that this is a problem for the kind of programs you're writing. I've had a few goes at trying to solve it in recent weeks, without mentioning much on here because I haven't necessarily been making much progress. I'm sorry that you find it unacceptable and I really would like to deal with it properly, but because of implementations decisions made early on, and the fact that I'm juggling quite a lot of stuff at the moment, I can't make any guesses as to how long it'll take.

You know, I think I'm going to take back that comment about "little point in hard coding things for Nat" because realistically that's the biggest problem we're going to encounter at compile time, and given that we say that Nat is for unbounded unsigned things, we probably ought to be a bit cleverer about it.

Let's see how this goes...

I recall from reading the Lean documentation, although I can't find where exactly, that they've taken the approach of hard-coding efficient integer behaviour for all types that satisfy a certain interface (or typeclass, or whatever; something like Haskell's Num). Could Idris do something similar, allowing any user-defined types that implement a particular interface to benefit from optimisation to use machine integers and integer arithmetic?

@logicchains I do not know Lean but we have tentatively implemented non-negative rational numbers in Idris on the top of Nat, please see https://gitlab.pik-potsdam.de/botta/IdrisLibs/tree/master/NonNegRational, and run-time computations seem reasonably fast. We have registered non-negative rationals as instances of Num. I understand that, at run time, Nat computations and, therefore, computations with non-negative rational numbers, are performed via integer arithmetic. The tables in my original post show that the problem here (but also in #3509) is not run-time efficiency but efficiency at type-check time.

@nicolabotta I'm sorry, I should have been clearer, I meant optimising the typechecker to use machine integers (and hard coding the kind of techniques used by e.g. SMT solvers for solving numerical equalities), as opposed to the naive approach of for instance actually treating 5 as S( S (S (S (S Z)))). If optimisations were hardcoded into the typechecker for dealing with Nats, it would be nice if user-defined types could also benefit from these optimisations if they satisfied the same interface as Nat (or some interface that captures all the properties that a type must satisfy in order for the optimisations to be valid).

@logicchains optimizing the type checker to efficiently compute with Nats and with numerical data types based on Nat is certainly a crucial step towards improving the usability of Idris.

Many "validated" applications require, in one way or another, certain pre-conditions to be satisfied. In many cases, these requirements can be expressed in terms of equalities of numerical values. Whenever such equalities have to be established through brute-force computations, efficiency is crucial.

Still, I think that my original post points at problems whose solution requires more than improving the efficiency of numerical computations at type-check time:

First, we have the problem that type checking times do not appear to be quadratic in n. So far, we do not have a plausible explanation for this behavior.

Second we have the problem that, no matter how fast Nat computations at type-check time might become, certain computations simply should not take place. Consider, for instance, the problem of computing

> q : x + y = 1

with x = 1 / (S n) and y = n / (S n) rational numbers and n : Nat large. In this case a brute-force approach

> q = Refl

would likely fail even if Nat computations were very fast. For instance, because of integer overflow. In these cases, we need to be able to apply (possibly postulated) results like

> lemma : m + n = S d -> m / (S d) + n / (S d) = 1 

to compute q without actually reducing x + y to 1.

In other words, I think that we are facing two problems here: 1) that too many unnecessary computations are done and 2) that these computations are done too slowly.

The fact that many unnecessary computations are done is demonstated by the (rather devastating) effect of unfreezing n in my original example!