Elaboration of explicit term proof significantly slower than equivalent tactic proof
somombo opened this issue · comments
Prerequisites
- Put an X between the brackets on this line if you have done all of the following:
- Check that your issue is not already filed.
- Reduce the issue to a minimal, self-contained, reproducible test case. Avoid dependencies to mathlib4 or std4.
Description
I have an example of a term based proof (in a recursive context) being about 200 times slower to elaborate than by simply adding "by exact
..." (or "by apply ...") to the front of the same term (see mwe below).
Context
See Swap-Perm project which formally defines permutations of arrays and proves related theorems.
I am trying to prove a basic theorem about an extension to the definition of Array.swap
that allows for making several swaps all at once. My extension is called Array.swaps
and accepts a list of pairs of indices instead of just a single pair of indices.
My first attempt at the theorem in question (Array.swaps_cancel
below) was a entirely a term based recursive proof. The offending line of code was inductive hypothesis (See
https://github.com/somombo/swaps-perm/blob/b4a05c1775ad1d32eadc07b635c10a25323a1815/SwapsPerm.lean#L100-L101)
Steps to Reproduce
Consider the following mwe:
def Array.swaps (a : Array α) : List (Fin a.size × Fin a.size) → Array α
| [] => a
| (i, j) :: ijs =>
have : (a.swap i j).size = a.size := a.size_swap _ _
swaps (a.swap i j) (ijs.map (fun p => ⟨⟨p.1.1, this.symm ▸ p.1.2⟩, ⟨p.2.1, this.symm ▸ p.2.2⟩⟩))
termination_by l => l.length
set_option trace.profiler true in
theorem Array.swaps_cancel (a : Array α) (l : List (Fin a.size × Fin a.size)) : a.swaps (l ++ l.reverse) = a :=
match l with
| [] => sorry
| c :: cs =>
have h : a.size = (a.swaps [c]).size := sorry
have ih1 : ((a.swaps [c]).swaps ((h ▸ cs) ++ (h ▸ cs).reverse)) = (a.swaps [c]) := swaps_cancel (a.swaps [c]) (h ▸ cs) -- FIXME: takes approx 5s
-- have ih2 : swaps (swaps a [c]) (h ▸ cs ++ List.reverse (h ▸ cs)) = swaps a [c] := swaps_cancel (a.swaps [c]) (h ▸ cs) -- FIXME: takes approx 1.7s
-- have ih3 : ((a.swaps [c]).swaps ((h ▸ cs) ++ (h ▸ cs).reverse)) = (a.swaps [c]) := by exact swaps_cancel (a.swaps [c]) (h ▸ cs) -- LGTM: takes approx 0.025s
-- have ih4 := swaps_cancel (a.swaps [c]) (h ▸ cs) -- LGTM: takes approx 0.018s
sorry
termination_by l.length
decreasing_by sorry
Take note of inductive hypotheses ih1
, ih2
, and ih4
which are all proven using an exact term and ih3
which is proven by tactic.
Uncomment whichever inductive hypothesis you would like to test the performance of, and comment out the rest. Upon the completion of elaboration, trace.profiler
will show how many seconds it took to elaborate the version of the theorem with whichever inductive hypothesis is under consideration.
Expected behavior: I'd expect inductive hypothesis ih3
, to either elaborate (almost) as quickly as ih2
and ih1
or be slower than them. I'd additionally expect all proofs to take no more than say 0.5-1sec to elaborate (seeing as they are relatively straight forward). Lastly, I am surprised to see that leaving out the type (theorem statement) entirely while using an exact term proof as in ih4
resulted in the fastest elaboration of all. The type it infers in that case, is precisely the one I explicitly provided in ih2
.
Actual behavior:
On my machine (v4.7.0) the various versions take the following amounts of time to elaborate
ih1
takes approx 5sih2
takes approx 1.7sih3
takes approx 0.025sih4
takes approx 0.018s
Versions
mainly 4.7.0 .. (on 4.8.0-rc1, all results above were consistently about 1 or 2seconds slower)
Windows 10 Pro Version 22H2 Build 19045.4291
Also, WSL Ubuntu 20.04.6 LTS (results were similar to above)
Additional Information
n/a
Impact
Add 👍 to issues you consider important. If others are impacted by this issue, please ask them to add 👍 to it.
Thanks for the excellent bug report, including even checking that it’s still the case on 4.8.0-rc1.
Initial investigation:
Unsurprisingly
attribute [irreducible] Array.swaps
will make all of them very fast, so #3995 or maybe #4002 might fix this.
Indeed
attribute [irreducible] WellFounded.fix
makes it much faster, but still slower than with Array.swaps
irreducible. This is an indication that #3995 may be preferable to #4002.
The new diagnostics confirm this; the first line produces
[reduction] unfolded declarations (max: 44101, num: 30): ▼
List.rec ↦ 44101
Acc.rec ↦ 29400
Nat.rec ↦ 29400
PSigma.rec ↦ 19742
List.length ↦ 19601
set ↦ 18900
WellFounded.apply ↦ 14952
WellFounded.rec ↦ 14952
Eq.rec ↦ 14700
List.map ↦ 14700
sizeOf ↦ 9942
InvImage.accessible ↦ 9940
InvImage.accAux ↦ 9940
rfl ↦ 9802
InvImage.wf ↦ 4958
invImage.proof_1 ↦ 4958
swaps._unary.proof_1 ↦ 4956
swaps ↦ 4914
WellFounded.fix ↦ 4914
WellFounded.fixF ↦ 4914
swaps._unary ↦ 4914
Add.add ↦ 4900
swap ↦ 4900
HAdd.hAdd ↦ 4900
Nat.add ↦ 4900
Nat.eq_or_lt_of_le ↦ 4900
Or.rec ↦ 4900
Prod.rec ↦ 4900
Append.append ↦ 350
HAppend.hAppend ↦ 350
So it’s clear that something is starting to unfold Array.swaps
here, and that this isn’t good. But I wonder if we can find out which call to defeq is doing that, and whether we can avoid them (as they might be expensive independent of wfrec).
The problem is certainly within the second occurrence of the subst notation, because
have ih1 : ((a.swaps [c]).swaps ((h ▸ cs) ++ (h.rec (motive := fun n _ => List (Fin n × Fin n)) cs).reverse)) = (a.swaps [c]) := swaps_cancel (a.swaps [c]) (h ▸ cs) -- LGTM: takes approx 0.018s
is also fast and we see in the example above that using by exact
(which hides the expected type, or at least changes the elaboration order) has an impact.
I stared a bit at traces and added a few trace
lines to elabSubst
, but without anything conclusive.
With
set_option trace.Meta.isDefEq.onFailure true in
you get this (large) output:
https://gist.github.com/nomeata/b637a6acd24449be9d6e92f0c7455bfd
and there is certainly a lot of repetition. Maybe somehow caching doesn’t work as it should?