This blog post explains the advantages and disadvantages of using Abstract Data Types (ADTs, see TaPL Chapter 24) in a Proof Assistant. On the plus side, ADTs promote data representation independence, code reuse, and clean code extraction; but they also do not support fix/match syntax, have to expose derived operations, and prohibit computational reasoning.

Download this blog post’s code here.

Consider Coq’s standard library implementation of natural number addition `add`

:

```
Print Nat.add.
(*
Fixpoint add (n m : nat) : nat :=
match n with
| 0 => m
| S n' => S (add n' m)
end
*)
```

This implementation is unsatisfactory, as it exposes the data representation details of the
`nat`

datatype. For example, this implementation of `add`

will not work with a more
space efficient binary representation of natural numbers.

The standard solution to this problem is to **hide the data representation using an ADT**. ADTs can be
implemented using either Modules (see), Sigma/Existential Types (see TaPL Chapter 24),
Records (see), or Type Classes (see).
In this blog post, we choose to implement ADTs in the Coq Proof
Assistants using Type Classes.

Our ADT for natural numbers `NatADT`

consists of:
a type `Nat`

whose representation is hidden,
operations on the type (`zero`

, `succ`

, `natRect`

), and
equations that describe the operation’s behavior (`natRectZero`

, `natRectSucc`

).

The operation `natRect`

is the eliminator for natural numbers.
The equations describe the computational behavior of `natRect`

.

```
Class NatADT := {
Nat : Type;
zero : Nat;
succ : Nat -> Nat;
natRect : forall P, P zero -> (forall n, P n -> P (succ n)) -> forall n, P n;
natRectZero : forall P z s, natRect P z s zero = z;
natRectSucc : forall P z s n, natRect P z s (succ n) = s n (natRect P z s n)
}.
```

Given this ADT, we can now implement an `add`

function that is **independent
of any particular implementation** of the natural numbers (hurray).

Unfortunately, this also means that we have to implement the code in
terms of the eliminator, and **cannot use fix/match syntax**
(sigh). However, getting around this syntactic restriction would be trivial
for eliminator based languages like Lean.

```
Definition add `{NatADT} (n m:Nat) : Nat :=
natRect _ m (fun _ rec => succ rec) n.
```

We can instantiate the ADT using Coq’s standard library natural numbers, …

```
Instance natNat : NatADT := {|
Nat := nat;
zero := 0;
succ := S;
natRect := nat_rect
|}.
Proof.
- reflexivity.
- reflexivity.
Defined.
```

… and call the `add`

function as we please.

```
Compute (@add natNat 5 7).
```

We can even prove that our implementation of `add`

is
equivalent to the standard library operation `Nat.add`

(`+`

).

```
Lemma eqAddAdd (n m : nat) : n + m = add n m.
induction n; cbn in *; congruence.
Qed.
```

We can also instantiate the ADT using Coq’s binary natural numbers `N`

, …

```
Require Import NArith.
Open Scope N.
Instance NNat : NatADT := {|
Nat := N;
zero := N.zero;
succ := N.succ;
natRect := N.peano_rect
|}.
Proof.
- apply N.peano_rect_base.
- apply N.peano_rect_succ.
Defined.
```

… and call the `add`

function as we please.

```
Compute (@add NNat 5 7).
```

ADTs thus **enable code reuse**, because a function has to only be implemented
once for an ADT, instead of every single instatiation of the ADT.

While this instantiation of `add`

is exponentially more space efficient
than the instantiation with `natNat`

, the computation still requires
time exponential in the number of `n`

’s bits.

The problem of ADTs is that they hide implementation details, and thus
deny opportunities for optimization.
To overcome this problem, ADT implementers have to guess and **expose all the operations
that some future instatiation of the ADT can implement more efficiently**
(even if they can be implemented
less efficiently using other operations of the ADT). The result is an ADT that
is hard to understand (because the essential operations are swamped by derived operations)
and inefficient (because some optimization opportunity will inevitably be lost for lack of precognition).

**ADTs are also useful for extraction**. We can instantiate an ADT using opaque terms
that will be implemented on extraction.

```
Parameter HsNat : Type.
Parameter hsZero : HsNat.
Parameter hsSucc : HsNat -> HsNat.
Parameter hsNatRect : forall P, P hsZero -> (forall n, P n -> P (hsSucc n)) -> forall n, P n.
Axiom hsNatRectZero : forall P z s, hsNatRect P z s hsZero = z.
Axiom hsNatRectSucc : forall P z s n, hsNatRect P z s (hsSucc n) = s n (hsNatRect P z s n).
Extraction Language Haskell.
Extract Constant HsNat => "Prelude.Integer".
Extract Constant hsZero => "0".
Extract Constant hsSucc => "Prelude.succ".
Extract Constant hsNatRect => "(\z s -> let f n = if n Prelude.== 0
then z
else s (Prelude.pred n)
(f (Prelude.pred n)) in f)".
Instance HsNatNat : NatADT := {|
Nat := HsNat;
zero := hsZero;
succ := hsSucc;
natRect := hsNatRect
|}.
Proof.
- apply hsNatRectZero.
- apply hsNatRectSucc.
Defined.
Definition hsAdd : HsNat := @add HsNatNat (hsSucc (hsSucc hsZero)) (hsSucc hsZero).
Extraction "src/Add.hs" hsAdd.
```

This approach clearly documents all the terms that have to be implemented
by extraction (`Parameter`

) and the assumptions about their behavior
(`Axiom`

), and is thus cleaner than the common alternative of
syntactically replacing arbitrary Coq terms and data types:

```
Extract Inductive nat => "Prelude.Integer" ["0" "Prelude.succ"]
"(\z s n -> if n Prelude.== 0
then z ()
else s (Prelude.pred n))".
Definition hsAddNat : nat := 2 + 1.
Extraction "src/AddNat.hs" hsAddNat.
```

There is another major downside to using ADTs — they severely impact
theorem proving. Consider the following proof that `0 + n = n + 0`

using
Coq’s standard library natural numbers. After the induction, the proof
is almost automatic due to Coq’s ability to computationally simplify (`cbn`

)
the goal.

```
Open Scope nat.
Lemma natAddZero (n : nat) : 0 + n = n + 0.
induction n as [|n' IHn'].
- reflexivity.
- cbn in IHn'. (* A *)
rewrite IHn' at 1. (* B *)
cbn. (* C *)
reflexivity.
Qed.
```

Compare this to the proof which uses the ADT, where we have to
**perform equational reasoning instead of computational reasoning**.
This is extremely burdensome, especially for more complicated proofs.

```
Lemma addZero `{Nat} n : add zero n = add n zero.
refine (natRect (fun n => add zero n = add n zero) _ (fun n' IHn' => _) n).
- reflexivity.
- unfold add in IHn'. (* A *)
rewrite natRectZero in IHn'. (* A *)
rewrite IHn' at 1. (* B *)
unfold add. (* C *)
rewrite natRectSucc. (* C *)
rewrite natRectZero. (* C *)
reflexivity.
Qed.
```

In summary, there are several advantages to using ADTs:

- ADTs are data representation independent.
- ADTs enable code reuse.
- ADTs enable clean extraction.

But there are also drawbacks:

- ADTs do not support fix/match syntax.
- ADTs have to expose derived operations.
- ADTs prohibit computational reasoning.