Simple type inference demo
  • OCaml 99.1%
  • Dune 0.9%
Find a file
2024-01-22 08:29:17 +01:00
lib Initial commit. 2024-01-22 08:29:17 +01:00
.gitignore Initial commit. 2024-01-22 08:29:17 +01:00
.ocamlformat Initial commit. 2024-01-22 08:29:17 +01:00
dune Initial commit. 2024-01-22 08:29:17 +01:00
dune-project Initial commit. 2024-01-22 08:29:17 +01:00
main.ml Initial commit. 2024-01-22 08:29:17 +01:00
README.org Initial commit. 2024-01-22 08:29:17 +01:00

Implementation of the paper "Algorithm W Step by Step" by Martin Grabmüller, with some modifications to support N-ary functions and tuples.

Approaches

The standard "simple" ML approach is to unify as you go. This is prone to error if the language has more advanced features like typeclasses or forward references; in that case, the ordering of the program can change how it types.

An alternative is to separate the typechecking into a separate AST traversal and constraint generation stage, and a constraint solving stage. This way, all constraints are known at solve time regardless of ordernig.

An additional benefit, is that you could formulate the constraints in a language like Z3 and get the solving for free.

Furthermore, this approach scales better, as new language features just map onto the constraint solver; the solver and constraint language remains the same. (SPJ)

The Type language

Type-checking and -inference algorithms are expressed using a type language, somewhat analogous to the expression language of functional programming languages. Terms in the type language, also known as types or monotypes, are recursively defined:

  type mono =
    | TyVar of string
    | TyFun of mono * mono
    | Int
    | Bool

This language can express things that are familiar to ML programmers. Examples:

  • int
  • int -> bool
  • a -> a

and so on.

Substitutions

A substitution, simply put, is an assignment of type variables to types. It can be implemented using a Map from string to t, or an association list, etc.

Transformations on types is expressed in terms of substitutions, and thus the definition of substitutions and the definition of the type language forms the core

    let rec apply s = function
    | TyVar a as t' ->
      ( match Map.get a s with
      | Some t -> t
      | None -> t' )
    | TyFun (t1, t2) -> TyFun (apply s t1, apply s t2)
    | t -> t

Type schemes

Also known as polytypes. The idea of type schemes is usually poorly motivated in the literature, and since the forall can be omitted when writing type signatures in ML, the need might not be obvious. However, being able to "quantify" a type variable is the ticket to polymorphism.

Existential and universal quantification

The difference between the monotype a -> a, and the polytype forall a. a -> a, is that a in a -> a represents an unknown, yet to be determined, concrete type, which will be resolved later by unification.

The former is somtimes referred to as existential quantification; the type a "exists but is unknown".

The latter is referred to as universal quantification; the type a can be any type.

Definition of polytypes

The normal way to define a polytype, is something along the lines

  type poly =
    | Forall of string * poly
    | Type of mono

allowing you to express something like forall a. forall b. a -> b, where a -> b is a monotype.

Polytypes are considered equal up to ordering and renaming, so another possible representation is something like type poly = set * mono with a set of quantified type variables.

Renaming a quantified type is called alpha-conversion.

Unquantified type variables are considered constant under substitution.

Using different notation, where quantified variables are represented by a different set of symbols, specializing a type by substitution is a simple remapping.

The Type Context

Often named Γ, the type context is simply a mapping from symbol names in the expression language, to (poly)types. It can be implemented as a Map from string to poly.

The context contains elements of the sort x : σ, called type assignments, asserting that the variable named x has polytype σ.

Motivation for contexts: Lambdas. To type fun(x) -> e, x can occur in the expression e, but x does not have a known type. We need the context to keep track of the types of symbols that can appear in the body of the expression we are typing.

Free type variables

Type variables occurring in a type without a corresponding quantification are said to be free.

Generalization and instantiation