OCaml for Aliens: Structure

Structural elements of an OCaml program (or how to parse it as a human).

Part of OCaml for Aliens.

Semicolons!

What does the single and double semicolon mean and when should they be used? This was my first bafflement with the language and I will try to explain it the best I can.

Double Semicolons

Double semicolons (;;) are used in the toplevel to command it to evaluate what has been input so far. It's never used when writing actual OCaml source code in files that are to be compiled. There is one practical temporary use of it in source files. When there are syntax errors in a function, double semicolons can be inserted after it to prevent syntax errors from spreading out of their real location.

Toplevel double semicolon example:

$ utop
utop # let twice x = x * 2 ;;
val twice : int -> int = <fun>

utop # twice 42 ;;
- : int = 84

Single Semicolon

Single semicolons are used to sequence expressions.

A function that print its input value and then return it:

let hogwash x =
  Printf.printf "hogwash %d\n" x; x

Both Printf.printf "hogwash %d\n" x and x are expressions and so is the sequence of them. In a sequence of expressions, the last sub expression is returned as its value.

In a sequence of expressions all but the last must evaluate to the special unit value which is analogous to void or nil in other languages.

Invalid expression sequence:

1 + 3; 4

The expression 1 + 3 is not the last sub expression and does not evaluate to the unit value. If intentional, the ignore function can be used to throw away the result and return the unit value in its place. In this example it's totally bonkers to do so but valid cases exist; such as when calling a function for its side effects and its return value is inconsequential.

Valid expression sequence:

ignore (1 + 3); 4

Parentheses

Parentheses are used for different purposes than in C like languages. These are what I think the most common uses of them.

Expression Delimiter

In ignore (1 + 3) the parentheses are not the argument list boundaries in the call to the ignore function; their purpose is to delimit the expression 1 + 3.

Examples to illustrate this:

let twice x = x * 2    (* Function that doubles its input *)

let x = twice 2 + 3    (* x is 7:  twice 2 -> 4 + 3 = 7 *)
let x = twice (2 + 3)  (* x is 10: twice (2 + 3) -> twice 5 -> 10 *)

Expression Sequence Delimiter

The if expression is of the form if condition then expr1 else expr2. The more seldom used if condition then expr is allowed if expr evaluates to the unit value. In both cases the expression must be delimited using parentheses (or the alternative begin and end) if it's an expression sequence.

if (pred c) then (
  save_char ctx c;
  consume ()
) else get_token_datum ctx

Or alternatively:

if (pred c) then
  begin
    save_char ctx c;
    consume ()
  end
else get_token_datum ctx

Unit Value

Empty parentheses is the unit value, unit is the type and () is the unit value.

The following function takes one parameter (the unused placeholder _) and returns the unit value:

let ignore _ = ()

A function without parameters must also be written to accept the unit value.

let something () = 10   (* Function that returns 10 *)
let x = something ()    (* x is 10: something is passed the unit value *)

Omitting the unit value, something would be the value 10 instead of a function returning 10.

let something = 10  (* something is 10 *)
let x = something   (* x is also 10 *)

Tuples

Parentheses are used to define tuples and to destruct (unpack) values out of them. Sometimes the parentheses can be omitted.

let frobz (a : int * int) =   (* Function accepting a tuple of two integers *)
  let x, y = a in x * y       (* Destruct a into x and y and multiply them *)

let x = frobz (2, 3)          (* x is 6: frobz is given the tuple (2, 3) *)

The following types are not identical (disregarding the type constructor names X and Y.

type foo = X of int * int
type bar = Y of (int * int)

Only the last one is a type constructor accepting a tuple of two integers.

Array Accessors

Parentheses are used for setting and getting array elements.

let x = [|1;2;3|]          (* x is an array with values: 1, 2, 3 *)
let y = x.(1)              (* y is 2: index 1 of x *)
let y = x.(1) <- 3; x.(1)  (* y is 3: index 1 of x is set to 3 (evaluates to unit);
                                      index 1 of x is retrieved *)

Module Scope

The three dots inside Module.(...) has the Module opened.

In the following example both concat and length are functions in the String module.

String.(concat "" ["a";"b";"c"] |> length)

Type Annotations

Function parameter type annotations require parentheses around them or they would be ambiguous with type annotations for the return value. See Functions: Type Inference and Annotations.

Functions

The syntax of function definitions is very different from C like languages.

Signatures

A lot of arrows are used to describe function signatures. Anything that has an arrow in it means that it's a function taking one or more arguments and returning a value of the last type in the chain of arrows. The type signature int -> int is a function accepting an integer and returning an integer. Similarly int -> int -> int is a function accepting two integers and returning one.

Sooner or later you will come across something that looks like ('a -> 'b) -> 'a list -> 'b list (this is the type signature of List.map). Following the arrows we first have ('a -> 'b). The parenthesis with an arrow inside is key here meaning that the first parameter is itself a function taking an argument of type 'a and returning a value of type 'b. The second parameter is 'a list which is a list of values of type 'a. Finally the function returns a list of values of type 'b.

The types 'a and 'b are polymorphic meaning that values of any type are accpted. The significance of a and b is that they may be different types (as the type names are different). In this particular example, the first arguement is a function accepting a value of any type and returning a value of any type.

The following is an example of how to use List.map that has been dissected above. The first argument is an anonymous function with a parameter a that is inferred to be an integer based on its use. The type signature for string_of_int is int -> string. The second argument is a list of integers. List.map applies its first argument (the map function) to each element of the input list, producing a new output list.

let x = List.map (fun a -> string_of_int a) [1;2;3]

The type signature of x is string list.

Note that polymorphic type names like 'a and 'b have nothing to do with function parameter names. They are only named as type place holders.

Partial Application

To execute a function is to apply it. Partial application is to only give a function some of its required arguments, thereby producing a new function with some of its parameters bound to specific arguments.

Example:

let int_list_stringer = List.map (fun a -> string_of_int a)
let x = int_list_stringer [1;2;3]

The function int_list_stringer have the first parameter of List.map bound to the anonymous function (fun a -> string_of_int a). Since the type signature of the map function is int -> string, the type signature of int_list_stringer is no longer polymorphic like that of List.map but rather int list -> string list.

The arrow syntax makes more sense when this feature is available.

Type Inference and Annotations

OCaml infer types of let bindings and function parameters by their use. The type signature of let forward x = x is 'a -> 'a, i.e. polymorphic because there is nothing that restricts the use of x to a specific type. When coding misstakes are made while writing a function it's possible that syntax errors are reported in confusing ways because of the type inference. A way around this is to annotate parameters with the intended type which will give more precise syntax errors.

Example of a function with all parameters and return value annotated:

let foo (x : int) (y : int) : int = x * y

If foo is to be exported an alternative is to put its type signature in a .mli file (analogous to a header file in other languages).

val foo: int -> int -> int

Application and Passing as Value

OCaml has first class functions, i.e. functions can return functions and functions can be passed around as values. Functions bind any used variables from the environment as seen when they are created. There is no special syntax to differentiate when a function is passed as a value as opposed to when it's applied. A function is applied when all required arguments are specified or it evaluates to function that can for example be passed as a value.

Function passed as value:

(* foo takes two integers and multiply them *)
let foo x y = x * y

(* bar applies function f to value 7 *)
let bar f = f 7

(* x is 21: bar is given a partially applied foo with its first
   parameter bound to the value 3 *)
let x = bar (foo 3)

Let Expressions

It's very common to bind a value to a name that will be used in another expression. For some reason this is not highlighted much in the OCaml manual. The syntax could be read like: let this in that.

Examples:

let humpty x =
  let dumpty y = x * y in
  dumpty 10

let hubert x =
  let a = 3 in
  let b = 7 in
  x * a * b

let x = hubert 2  (* x is 42 *)

Published: 2019-07-10