Mogensen–Scott encoding

In computer science, Scott encoding is a way to represent algebraic data types in the lambda calculus, following their syntactic definition without regard whether they are recursive or not. This is unlike Church encoding which treats recursive data types specially, representing them with right folds. The data and operators form a mathematical structure which is embedded in the lambda calculus.

Mogensen–Scott encoding extends and slightly modifies Scott encoding by applying the encoding to Metaprogramming^{[citation needed]}. This encoding allows the representation of lambda calculus terms, as data, to be operated on by a meta program.

Exposition

Numbers

Scott encoding of numbers follows the Peano definition of natural numbers as a sum of two cases, the zero case and the successor case,

$\qquad Nat:=\operatorname {Zero} \ |\,\operatorname {Succ} Nat$

Correspondingly, Scott numerals are functions which expect two arguments, two handlers, each receiving the corresponding case data from the number:

$\quad {\begin{aligned}0&=\lambda zs.z\\\operatorname {Succ} &=\lambda n.\lambda zs.s\ n\end{aligned}}$

The zero case has no data. The successor case data is its Scott numeral, which is served as the argument to the corresponding handler.

When a Scott numeral is supplied with two handlers, it calls the appropriate one with its corresponding data. Scott encoded values perform a choice between the sum data type cases.

$\quad {\begin{aligned}\operatorname {IsZero} &=\lambda n.n\ \operatorname {True} \ (\lambda m.\operatorname {False} )\\\operatorname {Pred} &=\lambda n.n\ 0\ (\lambda m.m)\end{aligned}}$

Recursive operations on Scott numerals require explicit use of recursion, e.g. using $\operatorname {Y}$ combinator:

$\qquad \operatorname {Add} =\operatorname {Y} \lambda rpq.p\ q\ (\lambda m.\operatorname {Succ} \,(r\ m\ q))$

Church numerals, on the other hand, already embody the primitive recursion and perform the folding / looping on their own:

$\qquad \operatorname {Add_{_{\,Church}}} =\lambda pqsz.p\ s\ (q\ s\ z)$

The key difference is that Scott's handler's argument is the number's own predecessor Scott numeral, unprocessed, whereas Church's folding / looping function's argument is the result of folding / looping over its predecessor.

Lists

Scott encoding of lists follows their definition as a sum of two cases, the empty list case and the cons case,

$\qquad List:=\operatorname {NIL} \ |\,\operatorname {Cons} \,\langle val\rangle \,List$

Correspondingly, Scott lists are functions which expect two arguments, two handlers, each receiving the corresponding data from the list:

$\quad {\begin{aligned}\operatorname {NIL} &=\lambda nc.n\\\operatorname {Cons} &=\lambda ad.\lambda nc.c\ a\ d\end{aligned}}$

The empty list case has no data. The cons case data are the head element and the list's tail, which are served as the arguments to the corresponding handler.

When a Scott list is supplied with two handlers, it calls the appropriate one with its corresponding data. Scott encoded values perform a choice between the sum data type cases.

$\quad {\begin{aligned}\operatorname {IsEmpty} &=\lambda l.l\,\operatorname {True} \,(\lambda ad.\operatorname {False} )\\\operatorname {Head} &=\lambda l.l\,\operatorname {NIL} \,(\lambda ad.a)\\\operatorname {Tail} &=\lambda l.l\,\operatorname {NIL} \,(\lambda ad.d)\end{aligned}}$

Recursive operations on Scott lists require explicit use of recursion, e.g. using $\operatorname {Y}$ combinator:

$\qquad \operatorname {Append} =\operatorname {Y} \lambda rpq.p\ q\ (\lambda ad.\operatorname {Cons} \,a\ (r\ d\ q))$

Church lists, on the other hand, already embody the primitive recursion and perform the folding on their own:

$\qquad \operatorname {Append_{_{\,Church}}} =\lambda pqcn.p\ c\ (q\ c\ n)$

The key difference is that Scott's handler's second argument is the list's own tail, unprocessed, which is also a Scott list, whereas Church's folding function's second argument is the result of folding over its tail.

This is the argument that corresponds to the recursive datum in the data type definition. There are no differences between the two encodings in the other, non-recursive data fields, except possibly in the corresponding arguments' order.

History

Scott encoding appears first in a set of unpublished lecture notes by Dana Scott^[1] whose first citation occurs in the book Combinatorial Logic, Volume II.^[2] Michel Parigot gave a logical interpretation of and strongly normalizing recursor for Scott-encoded numerals,^[3] referring to them as the "Stack type" representation of numbers. Torben Mogensen later extended Scott encoding for the encoding of Lambda terms as data.^[4]

Discussion

Lambda calculus allows data to be stored as parameters to a function that does not yet have all the parameters required for application. For example,

((\lambda x_{1}\ldots x_{n}.\lambda c.c\ x_{1}\ldots x_{n})\ v_{1}\ldots v_{n})\ f

May be thought of as a record or struct where the fields $x_{1}\ldots x_{n}$ have been initialized with the values $v_{1}\ldots v_{n}$ . These values may then be accessed by applying the term to a function f. This reduces to,

f\ v_{1}\ldots v_{n}

c may represent a constructor for an algebraic data type in functional languages such as Haskell. Now suppose there are N constructors, each with $A_{i}$ arguments;

{\begin{array}{c|c|c}{\text{Constructor}}&{\text{Given arguments}}&{\text{Result}}\\\hline ((\lambda x_{1}\ldots x_{A_{1}}.\lambda c_{1}\ldots c_{N}.c_{1}\ x_{1}\ldots x_{A_{1}})\ v_{1}\ldots v_{A_{1}})&f_{1}\ldots f_{N}&f_{1}\ v_{1}\ldots v_{A_{1}}\\((\lambda x_{1}\ldots x_{A_{2}}.\lambda c_{1}\ldots c_{N}.c_{2}\ x_{1}\ldots x_{A_{2}})\ v_{1}\ldots v_{A_{2}})&f_{1}\ldots f_{N}&f_{2}\ v_{1}\ldots v_{A_{2}}\\\vdots &\vdots &\vdots \\((\lambda x_{1}\ldots x_{A_{N}}.\lambda c_{1}\ldots c_{N}.c_{N}\ x_{1}\ldots x_{A_{N}})\ v_{1}\ldots v_{A_{N}})&f_{1}\ldots f_{N}&f_{N}\ v_{1}\ldots v_{A_{N}}\end{array}}

Each constructor selects a different function from the function parameters $f_{1}\ldots f_{N}$ . This provides branching in the process flow, based on the constructor. Each constructor may have a different arity (number of parameters). If the constructors have no parameters then the set of constructors acts like an enum; a type with a fixed number of values. If the constructors have parameters, recursive data structures may be constructed.

Definition

Let D be a datatype with N constructors, $\{c_{i}\}_{i=1}^{N}$ , such that constructor $c_{i}$ has arity $A_{i}$ .

Scott encoding

The Scott encoding of constructor $c_{i}$ of the data type D is

\lambda x_{1}\ldots x_{A_{i}}.\lambda c_{1}\ldots c_{N}.c_{i}\ x_{1}\ldots x_{A_{i}}

Mogensen–Scott encoding

Mogensen extends Scott encoding to encode any untyped lambda term as data. This allows a lambda term to be represented as data, within a Lambda calculus meta program. The meta function mse converts a lambda term into the corresponding data representation of the lambda term;

{\begin{aligned}\operatorname {mse} [x]&=\lambda a,b,c.a\ x\\\operatorname {mse} [M\ N]&=\lambda a,b,c.b\ \operatorname {mse} [M]\ \operatorname {mse} [N]\\\operatorname {mse} [\lambda x.M]&=\lambda a,b,c.c\ (\lambda x.\operatorname {mse} [M])\\\end{aligned}}

The "lambda term" is represented as a tagged union with three cases:

Constructor a - a variable (arity 1, not recursive)
Constructor b - function application (arity 2, recursive in both arguments),
Constructor c - lambda-abstraction (arity 1, recursive).

For example,

{\begin{array}{l}\operatorname {mse} [\lambda x.f\ (x\ x)]\\\lambda a,b,c.c\ (\lambda x.\operatorname {mse} [f\ (x\ x)])\\\lambda a,b,c.c\ (\lambda x.\lambda a,b,c.b\ \operatorname {mse} [f]\ \operatorname {mse} [x\ x])\\\lambda a,b,c.c\ (\lambda x.\lambda a,b,c.b\ (\lambda a,b,c.a\ f)\ \operatorname {mse} [x\ x])\\\lambda a,b,c.c\ (\lambda x.\lambda a,b,c.b\ (\lambda a,b,c.a\ f)\ (\lambda a,b,c.b\ \operatorname {mse} [x]\ \operatorname {mse} [x]))\\\lambda a,b,c.c\ (\lambda x.\lambda a,b,c.b\ (\lambda a,b,c.a\ f)\ (\lambda a,b,c.b\ (\lambda a,b,c.a\ x)\ (\lambda a,b,c.a\ x)))\end{array}}

Comparison to the Church encoding

The Scott encoding coincides with the Church encoding for booleans. Church encoding of pairs may be generalized to arbitrary data types by encoding $c_{i}$ of D above as^{[citation needed]}

\lambda x_{1}\ldots x_{A_{i}}.\lambda c_{1}\ldots c_{N}.c_{i}(x_{1}c_{1}\ldots c_{N})\ldots (x_{A_{i}}c_{1}\ldots c_{N})

compare this to the Mogensen Scott encoding,

\lambda x_{1}\ldots x_{A_{i}}.\lambda c_{1}\ldots c_{N}.c_{i}x_{1}\ldots x_{A_{i}}

With this generalization, the Scott and Church encodings coincide on all enumerated datatypes (such as the boolean datatype) because each constructor is a constant (no parameters).

Concerning the practicality of using either the Church or Scott encoding for programming, there is a symmetric trade-off:^[5] Church-encoded numerals support a constant-time addition operation and have no better than a linear-time predecessor operation; Scott-encoded numerals support a constant-time predecessor operation and have no better than a linear-time addition operation.

Type definitions

Church-encoded data and operations on them are typable in system F, as are Scott-encoded data and operations. However, the encoding is significantly more complicated.^[6]

The type of the Scott encoding of the natural numbers is the positive recursive type:

\mu X.\forall R.R\to (X\to R)\to R

Full recursive types are not part of System F, but positive recursive types are expressible in System F via the encoding:

\mu X.G[X]=\forall X.((G[X]\to X)\to X)

Combining these two facts yields the System F type of the Scott encoding:

\forall X.(((\forall R.R\to (X\to R)\to R)\to X)\to X)

This can be contrasted with the type of the Church encoding:

\forall X.X\to (X\to X)\to X

The Church encoding is a second-order type, but the Scott encoding is fourth-order!

Notes

^ Scott, Dana (1968) [1962]. A system of functional abstraction. Lectures delivered at University of California, Berkeley.
^ Curry, Haskell (1972). Combinatorial Logic, Volume II. North-Holland Publishing Company. ISBN 0-7204-2208-6.
^ Parigot, Michel (1988). "Programming with proofs: A second order type theory". In H. Ganzinger (ed.). European Symposium on Programming: ESOP '88. 2nd European Symposium on Programming. Nancy, France, March 21–24, 1988. Lecture Notes in Computer Science. Vol. 300. Springer. pp. 145–159. doi:10.1007/3-540-19027-9_10. ISBN 978-3-540-19027-1.
^ Mogensen, Torben (1994). "Efficient Self-Interpretation in Lambda Calculus". Journal of Functional Programming. 2 (3): 345–364. doi:10.1017/S0956796800000423. S2CID 8736707.
^ Parigot, Michel (1990). "On the representation of data in lambda-calculus". In Egon Börger; Hans Kleine Büning; Michael M. Richter (eds.). International Workshop on Computer Science Logic: CSL '89. 3rd Workshop on Computer Science Logic. Kaiserslautern, FRG, October 2-6, 1989. Lecture Notes in Computer Science. Vol. 440. Springer. pp. 209–321. doi:10.1007/3-540-52753-2_47. ISBN 978-3-540-52753-4.
^ See the note "Types for the Scott numerals" by Martín Abadi, Luca Cardelli and Gordon Plotkin (February 18, 1993).

References

Stump, A. (2009). Directly reflective meta-programming. Higher-Order and Symbolic Computation, 22, 115-144.
Mogensen, T.Æ. (1992). Efficient Self-Interpretations in lambda Calculus. J. Funct. Program., 2, 345-363.

[1] Scott, Dana (1968) [1962]. A system of functional abstraction. Lectures delivered at University of California, Berkeley.

[2] Curry, Haskell (1972). Combinatorial Logic, Volume II. North-Holland Publishing Company. ISBN 0-7204-2208-6.

[3] Parigot, Michel (1988). "Programming with proofs: A second order type theory". In H. Ganzinger (ed.). European Symposium on Programming: ESOP '88. 2nd European Symposium on Programming. Nancy, France, March 21–24, 1988. Lecture Notes in Computer Science. Vol. 300. Springer. pp. 145–159. doi:10.1007/3-540-19027-9_10. ISBN 978-3-540-19027-1.

[4] Mogensen, Torben (1994). "Efficient Self-Interpretation in Lambda Calculus". Journal of Functional Programming. 2 (3): 345–364. doi:10.1017/S0956796800000423. S2CID 8736707.

[5] Parigot, Michel (1990). "On the representation of data in lambda-calculus". In Egon Börger; Hans Kleine Büning; Michael M. Richter (eds.). International Workshop on Computer Science Logic: CSL '89. 3rd Workshop on Computer Science Logic. Kaiserslautern, FRG, October 2-6, 1989. Lecture Notes in Computer Science. Vol. 440. Springer. pp. 209–321. doi:10.1007/3-540-52753-2_47. ISBN 978-3-540-52753-4.

[6] See the note "Types for the Scott numerals" by Martín Abadi, Luca Cardelli and Gordon Plotkin (February 18, 1993).

[1]

[2]

[3]

[4]

[5]

[6]

Exposition

Numbers

Lists

History

Discussion

Definition

Scott encoding

Mogensen–Scott encoding

Comparison to the Church encoding

Type definitions

See also

Notes

References