OCaml for Aliens: Structure
Structural elements of an OCaml program (or how to parse it as a human).
Part of OCaml for Aliens.
Semicolons!
What does the single and double semicolon mean and when should they be used? This was my first bafflement with the language and I will try to explain it the best I can.
Double Semicolons
Double semicolons (;;
) are used in the toplevel to command it to evaluate what has been input so far.
It’s never used when writing actual OCaml source code in files that are to be compiled.
There is one practical temporary use of it in source files.
When there are syntax errors in a function, double semicolons can be inserted after it to prevent syntax errors from spreading out of their real location.
Toplevel double semicolon example:
$ utop utop # let twice x = x * 2 ;; val twice : int -> int = <fun> utop # twice 42 ;; - : int = 84
Single Semicolon
Single semicolons are used to sequence expressions.
A function that print its input value and then return it:
let hogwash x = Printf.printf "hogwash %d\n" x; x
Both Printf.printf "hogwash %d\n" x
and x
are expressions and so is the sequence of them.
In a sequence of expressions, the last sub expression is returned as its value.
In a sequence of expressions all but the last must evaluate to the special unit
value which is analogous to void
or nil
in other languages.
Invalid expression sequence:
1 + 3; 4
The expression 1 + 3
is not the last sub expression and does not evaluate to the unit
value.
If intentional, the ignore
function can be used to throw away the result and return the unit
value in its place.
In this example it’s totally bonkers to do so but valid cases exist;
such as when calling a function for its side effects and its return value is inconsequential.
Valid expression sequence:
ignore (1 + 3); 4
Parentheses
Parentheses are used for different purposes than in C like languages. These are what I think the most common uses of them.
Expression Delimiter
In ignore (1 + 3)
the parentheses are not the argument list boundaries in the call to the ignore
function;
their purpose is to delimit the expression 1 + 3
.
Examples to illustrate this:
let twice x = x * 2 (* Function that doubles its input *) let x = twice 2 + 3 (* x is 7: twice 2 -> 4 + 3 = 7 *) let x = twice (2 + 3) (* x is 10: twice (2 + 3) -> twice 5 -> 10 *)
Expression Sequence Delimiter
The if
expression is of the form if
condition then
expr1 else
expr2.
The more seldom used if
condition then
expr is allowed if expr evaluates to the unit
value.
In both cases the expression must be delimited using parentheses (or the alternative begin
and end
) if it’s an expression sequence.
if (pred c) then ( save_char ctx c; consume () ) else get_token_datum ctx
Or alternatively:
if (pred c) then begin save_char ctx c; consume () end else get_token_datum ctx
Unit Value
Empty parentheses is the unit
value, unit
is the type and ()
is the unit
value.
The following function takes one parameter (the unused placeholder _
) and returns the unit
value:
let ignore _ = ()
A function without parameters must also be written to accept the unit
value.
let something () = 10 (* Function that returns 10 *) let x = something () (* x is 10: something is passed the unit value *)
Omitting the unit
value, something
would be the value 10 instead of a function returning 10.
let something = 10 (* something is 10 *) let x = something (* x is also 10 *)
Tuples
Parentheses are used to define tuples and to destruct (unpack) values out of them. Sometimes the parentheses can be omitted.
let frobz (a : int * int) = (* Function accepting a tuple of two integers *) let x, y = a in x * y (* Destruct a into x and y and multiply them *) let x = frobz (2, 3) (* x is 6: frobz is given the tuple (2, 3) *)
The following types are not identical (disregarding the type constructor names X
and Y
.
type foo = X of int * int type bar = Y of (int * int)
Only the last one is a type constructor accepting a tuple of two integers.
Array Accessors
Parentheses are used for setting and getting array elements.
let x = [|1;2;3|] (* x is an array with values: 1, 2, 3 *) let y = x.(1) (* y is 2: index 1 of x *) let y = x.(1) <- 3; x.(1) (* y is 3: index 1 of x is set to 3 (evaluates to unit); index 1 of x is retrieved *)
Module Scope
The three dots inside Module.(...)
has the Module opened.
In the following example both concat and length are functions in the String
module.
String.(concat "" ["a";"b";"c"] |> length)
Type Annotations
Function parameter type annotations require parentheses around them or they would be ambiguous with type annotations for the return value. See Functions: Type Inference and Annotations.
Functions
The syntax of function definitions is very different from C like languages.
Signatures
A lot of arrows are used to describe function signatures.
Anything that has an arrow in it means that it’s a function taking one or more arguments and returning a value of the last type in the chain of arrows.
The type signature int -> int
is a function accepting an integer and returning an integer.
Similarly int -> int -> int
is a function accepting two integers and returning one.
Sooner or later you will come across something that looks like ('a -> 'b) -> 'a list -> 'b list
(this is the type signature of List.map
).
Following the arrows we first have ('a -> 'b)
.
The parenthesis with an arrow inside is key here meaning that the first parameter is itself a function taking an argument of type 'a
and returning a value of type 'b
.
The second parameter is 'a list
which is a list of values of type 'a
.
Finally the function returns a list of values of type 'b
.
The types 'a
and 'b
are polymorphic meaning that values of any type are accpted.
The significance of a
and b
is that they may be different types (as the type names are different).
In this particular example, the first arguement is a function accepting a value of any type and returning a value of any type.
The following is an example of how to use List.map
that has been dissected above.
The first argument is an anonymous function with a parameter a
that is inferred to be an integer based on its use.
The type signature for string_of_int
is int -> string
.
The second argument is a list of integers.
List.map
applies its first argument (the map function) to each element of the input list, producing a new output list.
let x = List.map (fun a -> string_of_int a) [1;2;3]
The type signature of x
is string list
.
Note that polymorphic type names like 'a
and 'b
have nothing to do with function parameter names.
They are only named as type place holders.
Partial Application
To execute a function is to apply it. Partial application is to only give a function some of its required arguments, thereby producing a new function with some of its parameters bound to specific arguments.
Example:
let int_list_stringer = List.map (fun a -> string_of_int a) let x = int_list_stringer [1;2;3]
The function int_list_stringer
have the first parameter of List.map
bound to the anonymous function (fun a -> string_of_int a)
.
Since the type signature of the map function is int -> string
, the type signature of int_list_stringer
is no longer polymorphic like that of List.map
but rather int list -> string list
.
The arrow syntax makes more sense when this feature is available.
Type Inference and Annotations
OCaml infer types of let bindings and function parameters by their use.
The type signature of let forward x = x
is 'a -> 'a
, i.e. polymorphic because there is nothing that restricts the use of x
to a specific type.
When coding misstakes are made while writing a function it’s possible that syntax errors are reported in confusing ways because of the type inference.
A way around this is to annotate parameters with the intended type which will give more precise syntax errors.
Example of a function with all parameters and return value annotated:
let foo (x : int) (y : int) : int = x * y
If foo
is to be exported an alternative is to put its type signature in a .mli
file (analogous to a header file in other languages).
val foo: int -> int -> int
Application and Passing as Value
OCaml has first class functions, i.e. functions can return functions and functions can be passed around as values. Functions bind any used variables from the environment as seen when they are created. There is no special syntax to differentiate when a function is passed as a value as opposed to when it’s applied. A function is applied when all required arguments are specified or it evaluates to function that can for example be passed as a value.
Function passed as value:
(* foo takes two integers and multiply them *) let foo x y = x * y (* bar applies function f to value 7 *) let bar f = f 7 (* x is 21: bar is given a partially applied foo with its first parameter bound to the value 3 *) let x = bar (foo 3)
Let Expressions
It’s very common to bind a value to a name that will be used in another expression.
For some reason this is not highlighted much in the OCaml manual.
The syntax could be read like: let
this in
that.
Examples:
let humpty x = let dumpty y = x * y in dumpty 10 let hubert x = let a = 3 in let b = 7 in x * a * b let x = hubert 2 (* x is 42 *)