Konstantin Weitz: (Dis)advantages of using Abstract Data Types in Proof Assistants

(Dis)advantages of using Abstract Data Types in Proof Assistants

This blog post explains the advantages and disadvantages of using Abstract Data Types (ADTs, see TaPL Chapter 24) in a Proof Assistant. On the plus side, ADTs promote data representation independence, code reuse, and clean code extraction; but they also do not support fix/match syntax, have to expose derived operations, and prohibit computational reasoning.

Download this blog post’s code here.

Represenation Independence

Consider Coq’s standard library implementation of natural number addition add:

Print Nat.add.
(*
Fixpoint add (n m : nat) : nat :=
  match n with
  | 0 => m
  | S n' => S (add n' m)
  end
*)

This implementation is unsatisfactory, as it exposes the data representation details of the nat datatype. For example, this implementation of add will not work with a more space efficient binary representation of natural numbers.

The standard solution to this problem is to hide the data representation using an ADT. ADTs can be implemented using either Modules (see), Sigma/Existential Types (see TaPL Chapter 24), Records (see), or Type Classes (see). In this blog post, we choose to implement ADTs in the Coq Proof Assistants using Type Classes.

Our ADT for natural numbers NatADT consists of: a type Nat whose representation is hidden, operations on the type (zero, succ, natRect), and equations that describe the operation’s behavior (natRectZero, natRectSucc).

The operation natRect is the eliminator for natural numbers. The equations describe the computational behavior of natRect.

Class NatADT := {
  Nat : Type;
  zero : Nat;
  succ : Nat -> Nat;
  natRect : forall P, P zero -> (forall n, P n -> P (succ n)) -> forall n, P n;
  natRectZero : forall P z s, natRect P z s zero = z;
  natRectSucc : forall P z s n, natRect P z s (succ n) = s n (natRect P z s n)
}.

Given this ADT, we can now implement an add function that is independent of any particular implementation of the natural numbers (hurray).

Unfortunately, this also means that we have to implement the code in terms of the eliminator, and cannot use fix/match syntax (sigh). However, getting around this syntactic restriction would be trivial for eliminator based languages like Lean.

Definition add `{NatADT} (n m:Nat) : Nat :=
  natRect _ m (fun _ rec => succ rec) n.

Code Reuse

We can instantiate the ADT using Coq’s standard library natural numbers, …

Instance natNat : NatADT := {|
  Nat := nat;
  zero := 0;
  succ := S;
  natRect := nat_rect
|}.
Proof.
  - reflexivity.
  - reflexivity.
Defined.

… and call the add function as we please.

Compute (@add natNat 5 7).

We can even prove that our implementation of add is equivalent to the standard library operation Nat.add (+).

Lemma eqAddAdd (n m : nat) : n + m = add n m.
  induction n; cbn in *; congruence.
Qed.

We can also instantiate the ADT using Coq’s binary natural numbers N, …

Require Import NArith.

Open Scope N.

Instance NNat : NatADT := {|
  Nat := N;
  zero := N.zero;
  succ := N.succ;
  natRect := N.peano_rect
|}.
Proof.
  - apply N.peano_rect_base.
  - apply N.peano_rect_succ.
Defined.

… and call the add function as we please.

Compute (@add NNat 5 7).

ADTs thus enable code reuse, because a function has to only be implemented once for an ADT, instead of every single instatiation of the ADT.

While this instantiation of add is exponentially more space efficient than the instantiation with natNat, the computation still requires time exponential in the number of n’s bits.

The problem of ADTs is that they hide implementation details, and thus deny opportunities for optimization. To overcome this problem, ADT implementers have to guess and expose all the operations that some future instatiation of the ADT can implement more efficiently (even if they can be implemented less efficiently using other operations of the ADT). The result is an ADT that is hard to understand (because the essential operations are swamped by derived operations) and inefficient (because some optimization opportunity will inevitably be lost for lack of precognition).

Extracting ADTs

ADTs are also useful for extraction. We can instantiate an ADT using opaque terms that will be implemented on extraction.

Parameter HsNat : Type.
Parameter hsZero : HsNat.
Parameter hsSucc : HsNat -> HsNat.
Parameter hsNatRect : forall P, P hsZero -> (forall n, P n -> P (hsSucc n)) -> forall n, P n.
Axiom hsNatRectZero : forall P z s, hsNatRect P z s hsZero = z.
Axiom hsNatRectSucc : forall P z s n, hsNatRect P z s (hsSucc n) = s n (hsNatRect P z s n).

Extraction Language Haskell.

Extract Constant HsNat => "Prelude.Integer".
Extract Constant hsZero => "0".
Extract Constant hsSucc => "Prelude.succ".
Extract Constant hsNatRect => "(\z s -> let f n = if n Prelude.== 0 
                                                 then z 
                                                 else s (Prelude.pred n) 
                                                        (f (Prelude.pred n)) in f)".

Instance HsNatNat : NatADT := {|
                               Nat := HsNat;
                               zero := hsZero;
                               succ := hsSucc;
                               natRect := hsNatRect
                             |}.
Proof.
  - apply hsNatRectZero.
  - apply hsNatRectSucc.
Defined.

Definition hsAdd : HsNat := @add HsNatNat (hsSucc (hsSucc hsZero)) (hsSucc hsZero).
Extraction "src/Add.hs" hsAdd.

This approach clearly documents all the terms that have to be implemented by extraction (Parameter) and the assumptions about their behavior (Axiom), and is thus cleaner than the common alternative of syntactically replacing arbitrary Coq terms and data types:

Extract Inductive nat => "Prelude.Integer" ["0" "Prelude.succ"] 
                                        "(\z s n -> if n Prelude.== 0 
                                                    then z () 
                                                    else s (Prelude.pred n))".

Definition hsAddNat : nat := 2 + 1.
Extraction "src/AddNat.hs" hsAddNat.

Reasoning About ADTs

There is another major downside to using ADTs — they severely impact theorem proving. Consider the following proof that 0 + n = n + 0 using Coq’s standard library natural numbers. After the induction, the proof is almost automatic due to Coq’s ability to computationally simplify (cbn) the goal.

Open Scope nat.

Lemma natAddZero (n : nat) : 0 + n = n + 0.
  induction n as [|n' IHn'].
  - reflexivity.
  - cbn in IHn'.         (* A *)
    rewrite IHn' at 1.   (* B *)
    cbn.                 (* C *)
    reflexivity.
Qed.

Compare this to the proof which uses the ADT, where we have to perform equational reasoning instead of computational reasoning. This is extremely burdensome, especially for more complicated proofs.

Lemma addZero `{Nat} n : add zero n = add n zero.
  refine (natRect (fun n => add zero n = add n zero) _ (fun n' IHn' => _) n).
  - reflexivity.
  - unfold add in IHn'.             (* A *)
    rewrite natRectZero in IHn'.    (* A *)
    rewrite IHn' at 1.              (* B *)
    unfold add.                     (* C *)
    rewrite natRectSucc.            (* C *)
    rewrite natRectZero.            (* C *)
    reflexivity.
Qed.

Summary

In summary, there are several advantages to using ADTs:

ADTs are data representation independent.
ADTs enable code reuse.
ADTs enable clean extraction.

But there are also drawbacks:

ADTs do not support fix/match syntax.
ADTs have to expose derived operations.
ADTs prohibit computational reasoning.