Download Example

Document related concepts

Scala (programming language) wikipedia , lookup

Lambda calculus wikipedia , lookup

Curry–Howard correspondence wikipedia , lookup

Anonymous function wikipedia , lookup

Lambda lifting wikipedia , lookup

Closure (computer programming) wikipedia , lookup

Currying wikipedia , lookup

Falcon (programming language) wikipedia , lookup

Intuitionistic type theory wikipedia , lookup

C Sharp (programming language) wikipedia , lookup

Standard ML wikipedia , lookup

COM2010: Functional Programming
Lecture Notes, 2nd part
P Green, M Gheorghe, M Mendler
17. Recognisers and Translators
17.1 Finite State Machine (FSM)
17.2 Translator
17.3 Parser
There are some specific mechanisms for recognising words or sentences (regular expressions, contextfree grammars) or for translating them into other things. We shall present some of these mechanisms
and how they may be codified in Haskell.
For a given set S we may define sequences of symbols over S. More precisely, for
S={s_1, …, s_n}
x=x_1…x_p is a sequence of symbols over S if x_k is from S for any k=1..p. We denote by Seq(S) the
set of all sequences over S.
For example if Letters is the Latin alphabet, {‘a’..’z’, ‘A’..’Z’}, then the following sequences are
sequences of symbols over Letters (belong to Seq(Letters))
John home word
word1 long_sentence
aren’t (don’t belong to Seq(Letters)).
In general not all sequences are of a particular interest and in the set Seq(S) is identified a (proper)
subset called language which has some specific properties. We shall address those languages defined
by some syntactical rules. When S is an alphabet then a specific language is a vocabulary associated
to S and some rules defining words. A given vocabulary V, may be considered as a set S instead, and
in this case a specific language may be considered as being the set of sentences over V constructed
according to some rules.
17.1 Finite State Machine (FSM)
A FSM has a heterogeneous structure containing states, labels, and transitions. Since now on we have
to deal with only deterministic FSMs, called simply FSMs. By using the polymorphic
type SetOf a =[a]
we may define a FSM thus:
data Automaton
(SetOf State)
(SetOf Label)
(SetOf Transition)
Com2010 - Functional programming; 2002
(SetOf State) – set of final states
type State
= Int
type Label
= Char
data Transition
= Move State Label State
type InitialState = State
Note. Automaton is an algebraic data type.
where 0 is the initial state and 3, 7 are final states. This is defined in Haskell as
automatonEx = FSM [0..7] ['a','b','c']
[Move 0 'a' 1, Move 1 'b' 2,
Move 2 'c' 1, Move 2 'a' 3,
Move 3 'b' 3, Move 0 'b' 4,
Move 4 'a' 5, Move 5 'c' 7,
Move 4 'c' 6, Move 6 'a' 7]
0 [3,7]
In order to match a string against a FSM it’s required to start from the initial state and then find a path
leading to a final state. For example “abcbab” is recognised by the above FSM as we may start from
0 by recognising ‘a’ then go to 1 where ‘b’ is recognised and so on until arriving in 3 where the last
‘b’ is recognised and the next state, where the path stops, is still 3 which is a final state.
Various components of a FSM are obtained by using some select functions:
tr :: Automaton -> SetOf Transition
-- all transitions of an automaton
tr (FSM _ _ t _ _) = t
and for transitions
inState :: Transition -> State
-- input transition state
inState (Move s _ _) = s
outState :: Transition -> State
-- output transition state
outState (Move _ _ s) = s
label :: Transition -> Label
-- transition label
label (Move _ x _ ) = x
Com2010 - Functional programming; 2002
With the function below we may get all the transitions emerging from a state s and labelled with the
same given symbol x:
oneMove :: Automaton -> State -> Label -> SetOf Transition
oneMove a s x = [t| t <- tr a, inState t == s, label t == x]
where a list comprehension is used with some conditions imposed to transitions t of the FSM a.
A recogniser that matches an input string against a FSM starting from a state s, is recursively defined
recogniser :: Automaton -> State -> String -> State
recogniser a s xs
-- 0 or > 1 transition; ret. a dummy state
| length ts /= 1 = -1
-- no further inputs; returns next state
| tail_xs == [] = os
-- still inputs to be processed
| otherwise = recogniser a os tail_xs
where ts = oneMove a s (head xs);
tail_xs = tail xs;
os = outState (head ts)
The next function shows how a string is recognised by a FSM following a path from the initial state to
a final state
acceptor :: Automaton -> String -> Bool
acceptor a xs = isFinal a (recogniser a (inS a) xs)
isFinal :: Automaton -> State -> Bool
-- check whether or not a state is final
isFinal a s = s `elem` fs a
fs :: Automaton -> FinalStates
-- all final states
fs (FSM _ _ _ _ f) = f
inS :: Automaton -> InitialState
-- initial state
inS (FSM _ _ _ s _) = s
If we consider the automaton defined above we get
acceptor automatonEx "abcbab"  True
which says that automatonEx recognises the input string “abcbab” by traversing a path starting
from the initial state and stopping in a final state.
17.2 Translator
Any recogniser may be transformed into a translator by adding some mechanisms for getting out
symbols. The output symbols may be associated with inputs such as for any input symbol recognised a
suitable output symbol is sent out. For the automaton defined in 17.1 we may associate a translator as
Com2010 - Functional programming; 2002
Consequently for the input “abcbab” which is accepted by this automaton a corresponding output is
produced, namely “xyzyxy”.
The automaton with outputs may be defined by extending the definition of a FSM with suitable output
data AutomatonO
FSMO(SetOf State)
(SetOf InputLabel)
(SetOf OutputLabel)
(SetOf Transition)
(SetOf State) – set of final states
type InputLabel
type OutputLabel
data Transition
= Char
= Char
= Move State InputLabel OutputLabel State
A translator may be thus defined
translator ::
AutomatonO ->(State,OutString) ->InString -> (State,OutString)
where InString and OutString are defined as String and denote the input and output strings,
In this case any of the equations defining translator contains tuples instead of states. The tuples are of
the form (state,outSymbols), where outSymbols is a string collecting the output label of the
current transition.
Exercise. Define translator and the associated select functions.
Another type of translator is defined by aggregating some inputs and sending them out in certain states.
These translators are largely used to recognise lexical units or tokens of programming languages and
are called in this case lexical analysers.
The next example shows a FSM which is able to iterates through a sequence of characters and identify
in the final states 1 the identifiers (sequences of letters and digits with the first symbol being a letter)
and in 2 the integer numbers (sequences of digits). These are the lexical units and are delimited by a
space character ‘ ‘.
letter is any of ‘a’..’z’ or ‘A’..’Z’ and digit is any of ‘0’..’9’; letterDigit is
either letter or digit.
Com2010 - Functional programming; 2002
‘ ‘
‘ ‘
‘ ‘
For a string like “ident 453 Id7t” the above automaton may translate it into the following
lexical units ident and Id7t which are recognised in the final state 1 and 453 recognised in state
When a comment is recognised, a sequence starting with ’{-‘, ending with ‘-}’ and containing any
characters in between, in final state 6, then it is discarded. For example the string “34 {-comment}” produces only one token, 34
Important! In order to ease the process of recognising lexical units assume the tokens are always
separated by spaces (‘ ‘) and consequently from every final state we should have a transition to the
initial state labelled by ‘ ‘
The following definition is an extension of that given for a FSM which defines an extended
data ExtAutomaton
EFSM (SetOf State)
(SetOf Label)
(SetOf Transition)
(SetOf State)
(SetOf FinalStateType) —new!!
type FinalStateType = (State, TokenUnit)
type TokenUnit
= (Int,String)
It follows that the last line in the definition of ExtAutomaton contains a list of tuples (state,
tokenUnit), with state being a final state where a lexical unit is recognised and sent out in
tokenUnit. Every tokenUnit is a tuple where the first component is a code (an integer value used
by the parser) and the last part is the lexical unit itself.
Com2010 - Functional programming; 2002
In our example only states 1 and 2 occur in the list of FinalStateType. The final state 6 is not in
this list, and it follows that the tokens recognised in this state are discarded (these units correspond to
The translator, which is called lexical analyser, will use a translation function defined thus
translation ::
ExtAutomaton -> (State,SetOf TokenUnit) -> InputSequence ->
String-> (State, SetOf TokenUnit)
translation takes
 an extended automaton
 a tuple with the first component a state – in general the initial state – and the second part a list of
token units – in general empty –
 an input sequence of characters
 a string where the current lexical unit will be collected; initially it is empty
and produces the last state where the translation process stops and the sequence of token units
Example. Let us assume that for identifier, identCode(= 1) and for number, noCode(= 2)
are defined. If extAutomaton is the extended automaton corresponding to the last figure and the
input is
"ident {-comment-} 346 lastIdent" then
translate extAutomaton (0,[])
"ident {-comment-} 346 lastIdent"[] 
So the translation stops in state 1, which is a final state where lastIdent has been recognised and
produces the following token units:
(1,"ident") (2,"346") (1,"lastIdent")
translation function is defined by the following algorithm:
 when the input string is empty it stops by producing the current state and the list of token units
 otherwise (input string is not empty)
o if the character in the top of the input string is not ‘ ‘ then it is added to the string
collecting the current lexical unit and translation resumes from the next state,
current string collected, and the rest of the input string
o (current character is ‘ ‘) the previous state is in the list of FinalStateType then a
token unit is recognised and added to the list of token units and translation resumes
from the next state, with an empty string where next lexical unit will be collected, and the
rest of the input string
o otherwise (the previous state is not in that list) the collected token is discarded and
translation resumes from the next state, with an empty string where next lexical will
be collected, and the rest of the input string
17.3 Parser
Parsing a program means passing through the text of the program and checking whether the rules
defining the syntax of the programming language are correctly applied. In fact parsing comes
immediately after lexical analysis and consequently processes a sequence of token units rather than the
initial sequence of characters defining the program.
The syntax rules may be given in various forms: context-free rules, EBNF notation or syntax diagrams.
All these notations are equivalent but the last two provide more conciseness than the former.
Let us consider a very rudimentary imperative programming language, called SA (Sequence of
Com2010 - Functional programming; 2002
Assignments), consisting only of assignment statements delimited by ‘;’. Each assignment has also a
very simple form (identifier = number or identifier = identifier).
We also assume that every program should end with a specific lexical unit called ‘eop’ (lexical
analyser will be responsible for adding this bit).
We may define the syntax of SA with the following set of syntax diagrams:
1. Program::=
2. StmtList::=
3. Assign::=
4. RestAss::=
5. Exp::=
6. Trm::=
7. Operator::=
8. Eop::= eop
9. Delim::= ;
10. Identifier::= ident
11. AssSymb::= =
12. Number :: =no
13. AddOp::= +
14. MinOp::= 15. LHandS::= ident
 three main diagrams may be distinguished: sequence (1,3,4), alternative (5) and iteration (2)
 any of these diagrams has two (non-terminal) symbols
 the last diagrams ( 6 to 11) are sequence diagrams but with only one (terminal) symbol,
corresponding to main lexical units (in this case ident, no, ;, =, eop)
 a simpler specification may be obtained (try and find it!) but this is a kind of “normal form” which
will ease writing the parsing functions.
The following more general case, with four diagram types could be addressed:
Com2010 - Functional programming; 2002
In order to be able to write a deterministic parser (without backtracking) the corresponding equivalent
grammar should be LL(1), which means that the diagrams for alternation and iteration should possess
the following properties:
 (alternation) X and Y should derive disjoint sets of terminals on the first position – for SA
(diagram 5), ExpId derives {Ident} and ExpNo derives {No}
 (iteration) Y and the non-terminal that follows after Iteration should derive disjoint sets of
terminals on the first position – for SA (diagram 2), Delim derives {;} and the nonterminal after
StmtList is Eop which derives {eop}
Any function f involved in parsing is defined as
f:: SetOf TokenUnit -> SetOf TokenUnit
and will refer to the top element in the list of token units.
The parsing function for Sequence diagram
seqOf ::
(SetOf TokenUnit -> SetOf TokenUnit) ->
(SetOf TokenUnit -> SetOf TokenUnit) ->
SetOf TokenUnit -> SetOf TokenUnit
-- seqOf fX fY processes ->X -> Y->
seqOf fX fY = fY.fX -- composition
The parsing function for Alternation diagram
altOf ::
(SetOf TokenUnit -> SetOf TokenUnit)->
SetOf TokenUnit ->
(SetOf TokenUnit -> SetOf TokenUnit)->
SetOf TokenUnit ->
SetOf TokenUnit -> SetOf TokenUnit
-- altOf fX fY processes X or Y
altOf _ _ _ _ [] =
error ("Input: empty/ Alternative ")
altOf fX fXTUs fY fYTUs [email protected](t:ts')
| fst t `elem` map fst fXTUs = fX ts
| fst t `elem` map fst fYTUs = fY ts
| otherwise = error("Input: "++
show t++"/ Expected: "++show(head fXTUs)
++" or "++show(head FYTUs))
where fXTUs and fYTUs represent the sets of token units that derive from X and Y respectively;
[email protected](t:ts') is called as pattern and allows to refer to t:ts’ by using ts (as pattern will be
addressed later on)
The parsing functions for Iteration and Term diagram
iterOf ::
(SetOf TokenUnit -> SetOf TokenUnit) ->
(SetOf TokenUnit -> SetOf TokenUnit) ->
SetOf TokenUnit ->
Com2010 - Functional programming; 2002
SetOf TokenUni -> SetOf TokenUnit
-- iterOf fX fY processes X and
-- 'seqOf fY fX' 0 or many times
iterOf fX fY fYTUs ts =
iterationOf fX fY fYTUs (fX ts)
iterationOf ::
(SetOf TokenUnit -> SetOf TokenUnit) ->
(SetOf TokenUnit -> SetOf TokenUnit) ->
SetOf TokenUnit ->
SetOf TokenUniT -> SetOf TokenUnit
iterationOf _ _ _ [] =
error ("Input: empty/ Iteration ")
iterationOf fX fY fYTUs [email protected](t:ts')
| fst t `elem` map fst fYTUs =
iterationOf fX fY fYTUs (seqOf fY fX ts)
| otherwise = ts
fTerm :: TokenUnit -> SetOf TokenUnit ->
SetOf TokenUnit
-- fTerm processes the terminal t
fTerm t [] =
error("Input: empty/ Expected : "++show t)
fTerm t (y:ts)
| fst t /= fst y =
error("Input: "++show y++"/ Expected: "
++show x)
| otherwise = ts
fTerm check whether or not the terminal t is equal to the top element of the token unit list.
Recursive descent parser contains
parser :: SetOf TokenUnit -> Bool
parser ts = (fProgram ts == [])
which transforms a sequence of token units into a Boolean value and uses fProgram which
recursively invoke parsing functions.
The parsing function associated to rule 1 (sequence)
fProgram ::
SetOf TokenUnit -> SetOf TokenUnit
--1 Program :: StmtList Eop
fProgram = seqOf fStmtList fEop
The parsing function associated to rule 2 (iteration)
fStmtList ::
SetOf TokenUnit -> SetOf TokenUnit
--2 StmtList :: Assign {Delim Assign}
fStmtList =
iterOf fAssign fDelim [(sc, ";")]
The parsing function associated to rule 9 (Term)
fAssSymb ::
SetOf TokenUnit -> SetOf TokenUnit
--11 AssSymb :: =
fAssSymb = fTerm (ass, "=")
Example. If we consider the program “a = 1” then
Com2010 - Functional programming; 2002
translate extAutomaton (0,[]) “a = 1”[] 
parser (2,[(identCode,"a"),(ass,”=”),
(noCode,”1”)],(eop,”eop”)])  True
17.3.1 Empty variant. Parser output
There are alternative or iterative rules requiring empty variants. Empty (null) variant may be
considered as identity function :
fEmpty :: SetOf TokenUnit -> SetOf TokenUnit
fEmpty pu = pu
Empty alternative must be rewritten (simulating the use of fEmpty)
altOfEmpty :: (SetOf TokenUnit->SetOf TokenUnit)->
(SetOf TokenUnit->SetOf TokenUnit)->SetOf TokenUnit->
SetOf TokenUnit-> SetOf TokenUnit
-- altOfEmpty Empty g :: Empty or g
altOfEmpty _ _ _ [] = error ("Input: empty/ Alternative ")
altOfEmpty f g gTUs ([email protected](t:ts'))
| fst t `elem` map fst gTUs = g ts
| otherwise = f ts
where f (the function of the first position) is always fEmpty.
Empty alternative might occur when a statement has an empty variant (BNF notation):
Null_stmt ::= null | Empty
fNull_stmt :: SetOf TokenUnit -> SetOf TokenUnit
fNull_stmt = altOfEmpty fEmpty fNull [(null,””)]
Iterative rules with empty variant may be written using iterOf and fEmpty. For example if ‘;’ is
part of Assign statement then StmtList is written as
2. StmtList::=
which may be written as
fStmtList :: SetOf TokenUnit -> SetOf TokenUnit
-StmtList ::= Assign (Empty Assign)*
fStmtList = iterOf fStmtList fEmpty [(ident,"")]
Parsing is not only a verification step. Almost always an output is expected. In the case of the
arithmetic expressions the output is expected to be in a format suitable to direct evaluation. It is wellknown that an expression like 1+3*2 requires first the multiplication and then the addition and this
should be achieved through a more proper format of this expression. Such a format derives from the so
called Polish notation (operands followed by suitable operators) and takes into account priority rules
that might apply to the operators (* is evaluated before+). For the example above the expected output is
132*+. This format is used in order to evaluate the expression in one step using a stack of operands and
(partial) results. A very simple algorithm to evaluate such an expression works as follows:
 if the current symbol is operand then push into stack
 if the current symbol is operator then extract the two top elements, operates accordingly and
push the results into stack
 if the input is empty then top of the stack contains the results
The next problem is to change the parser presented above such as to capture an output in Polish
Com2010 - Functional programming; 2002
Changes requested.
(1) suitable data structures: SetOf TokenUnit is transformed into ParserUnit where
type ParserUnit = (SetOf TokenUnit,(SetOf Internal,SetOf Output))
type Internal = TokenUnit -- contains an internal temporary value
type Output = TokenUnit -- contains an output value
(2) change SetOf TokenUnit with ParserUnit in all definitions (including the rules)
(3) modify altOf, altOfEmpty, iterationOf and fTerm where explicit reference to a set
of tokens must be rewritten as a reference to ParserUnit. For example
fTerm :: TokenUnit ->ParserUnit -> ParserUnit
fTerm y ([],_) = error ("Input: empty/ Expected : "++show y)
fTerm y ((t:ts),x)
| fst y /= fst t =
error ("Input: "++show t++"/ Expected: "
++show y)
| otherwise = (ts,x)
(4) add auxiliary functions and change some terminal functions according to the following rules:
 identifier occurring on the left hand side is sent out
 ‘=’ and the operators ‘+’ and ‘-‘ are pushed onto stack
 every operand (either identifier or number) is sent out followed by the operator on the top of
the stack, if any; the operator is also discarded from the stack
 when ‘;’ or ‘eop’ occurs then ‘=’ which occurs on the top of the stack is sent out and the stack
is discarded
(4.1) auxiliary functions:
pushOp::ParserUnit -> ParserUnit
-- the current operator is kept as an internal value
outToken::ParserUnit -> ParserUnit
-- the current token, left hand side of an assign stmt, is sent out
outOpd::ParserUnit -> ParserUnit
-- the current operand is sent out and if an operator is kept in the stack this is also sent out and the
-- the stack is discarded
outAssSymb::ParserUnit -> ParserUnit
-- '=' is sent when either ';' or 'eop' is reached; ‘=’ is discarded from the stack
(4.2) change some terminal rules
fAddOp::ParserUnit -> ParserUnit
-- 13 AddOp ::=
fAddOp= seqOf pushOp (fTerm (pls, "+"))
(fAddOp which checked that ‘+’ occurs at the right position, becomes now a sequence of functions
(seqOf) of which the first one pushes the current value into the internal stack (pushOp) and the
second function is responsible to check the validity of the current token unit (fTerm (pls,”+”))).
fLHandS::ParserUnit -> ParserUnit
-- 15 LHandS ::=
fLHandS=seqOf outToken (fTerm (ident, ""))
fIdentifier::ParserUnit -> ParserUnit
-- 10 Identifier ::=
fIdentifier=seqOf outOpd (fTerm (ident, ""))
fDelim :: ParserUnit -> ParserUnit
-- 9 Delim ::=
fDelim = seqOf outAssSymb (fTerm (sc, ";"))
Com2010 - Functional programming; 2002
Evaluate an expression in Polish notation
-- evaluate an expression inPolish notation
type InpExp = [String]
type Stack =[Int]
eval:: (InpExp,Stack)->(InpExp,Stack)
eval ([],x)=([],x)
eval (v:vs,[])=eval(vs,[stringToInt v])
eval (v:vs,t2:s)
|v `elem` ["+","-","*","/"] = eval (vs,r:(tail s))
|otherwise = eval(vs, (stringToInt v):(t2:s))
t1=head s
r= case v of
"+"-> t1 + t2
"-"-> t1 - t2
"*"-> t1 * t2
"/"-> t1 `div` t2
stringToInt :: String -> Int
stringToInt [] = 0
stringToInt xs = (ord (last xs)-ord '0') + 10*stringToInt (init xs)
eval (["3","2","2","*", "+", "1", "-"],[]) => ([],[6])
Com2010 - Functional programming; 2002
18. Higher-Order Functions and Computation Patterns
18.1 The function type a->b
18.2 Arity and infix
18.2 Iteration and primitive recursion
18.4 Efficiency and recursion patterns
18.5 Partial functions and errors
18.6 More higher-order on lists
In any functional programming language that deserves its name, functions are first-class citizens. They
form a data type and thus can be passed as arguments and returned as values.
Functions of type (a->b) -> c take functions of type a->b as input and produce a result of type
Functions of type a -> (b->c) take values of type a as arguments and produce functions of type
b -> c as results.
Definition. A function is higher-order if it takes a function as an argument or returns a function as a
result, or both.
Higher-order functions that we have seen before in Chapter 16.2 include
map :: (a->b)->[a]->[b]
filter :: (a->Bool)->[a]->[a]
foldr :: (a->b->b)->b->[a]->b
merge :: (a->a->Bool)->[a]->[a]->[a]
mergesort :: (a->a->Bool)->[a]->[a]
Do you remember what they do? If not look them up and find out!
Higher-order functions and polymorphism are two abstraction mechanisms which are extremely useful
for conciseness of program code, and to achieve a high degree of program reuse.
18.1 The function type a->b
Objects of type a->b are constructed by lambda abstraction \x->e and used in function
f e’.
Lambda abstraction
if e has type b and x is a variable of type a then \x->e has type a->b
Function application
if f has type a->b and e’ has type a then f e’ has type b
Expressions such as \x->e are also called lambda expressions, or anonymous functions, in contrast
to functions that are declared and bound to a name by definition equations.
The function definition
double x=2*x
defines the behaviour of double point-wise. i.e. for every argument x the equation double x=2*x
specifies what double returns (viz. 2*x) when applied to x.
This definition has the same effect as
Com2010 - Functional programming; 2002
which defines double wholesale. The anonymous function \x->2*x is an expression for the
complete function.
We can have functions of type
f is a function which
takes an element of type a, and
returns a function which
takes an element of type b, and
returns a function which
takes an element of type c, and
returns an element of type d
-> associates to the right. We may write
a->b->c->d instead of a->(b->(c->d)). Similarly, we may write \x -> \y -> \z -> x + y + z
rather than
\x -> (\y -> (\z -> x + y + z)).
Or simpler
\x y z -> x + y + z
addThree::Num a=>a->(a->(a -> a))
addThree1::Num a=>a->(a->(a->a))
addThree2::Num a=>a->(a->(a->a))
addThree2=(\x y z->x+y+z)
Class Num a allows numeric computations with elements of type a (+, -, *,…).
In a functional programming language we can define operations on functions (i.e. “functions on
functions'') and use them to construct new functions from old ones. This is done in much the same way
as we use arithmetic operations on numbers.
A simple example of an operator on functions is function composition:
fcomp g f x=g (f x)
Function composition is useful enough to have its own operator symbol in Haskell Prelude:
(g.f) x=g (f x)
(g.f)=(\x->g (f x))
This is called an infix operator definition which shall be presented in the next section.
We can use function composition to define multiplication by 4:
timesfour=fcomp double double
Notice, we have defined timesfour directly. We use operations on functions instead.
18.2 Arity and infix
18.2.1 Types and arity
Now that we deal with higher-order functions we must be slightly more precise about type definitions
in Haskell. A type definition for a function is of the form
Com2010 - Functional programming; 2002
foo :: a1->a2 …->an->t
where ai and t are type expressions. It tells us how the function is to be defined and how it is to be
used in expressions.
The types in the list a1 a2 … an refer to the parameters. Their number n is the so-called arity of foo.
The arity determines how many formal arguments must be given in any definitional equation for foo.
So, with the above type definition a typical function definition for foo consists of guarded equations
of the form:
foo p1 p2 … pn
| g1 = b 1
| gk = b k
where pi is a pattern of type ai and bj expressions of type t. The gj are arbitrary guards.
If we had used the type definition
foo :: a1->(a2 …->an->t)
instead, which gives the same type but different arity, the function definition would have to look like
foo p1
| g1 = b 1
| gk = b k
where the bj have type a2->…->an->t. The arity of this version of foo is 1.
An extreme case arises by putting
foo :: (a1->a2 …->an->t)
where the arity becomes 0, and defining equations are of the form
foo = b
where the expression b constructs foo wholesale.
Note that guards here are pointless since there are no arguments on which the different choices could
Examples. Addition could be introduced with 3 different type definitions:
-- arity 2
-- arity 1
add0::(Int -> Int -> Int) -- arity 0
In each case the function definition must use a different number of argument patterns on the lhs of =,
and a different type for the expression on the rhs of =
add2 x y=x+y
-- 2 args x y::Int->Int; x+y::Int
add1 x =(\y->x+y)
-- 1 arg x::Int; \y->x+y::Int->Int
--no arg;\x->\y->x+y::Int->Int->Int
Although their definitions have different shapes, all three versions of addition can be used in exactly
the same way in expressions.
The types Int->Int->Int, Int->(Int->Int),
and (Int->Int->Int) are equivalent for expressions. The difference in arity is relevant ONLY for
function definitions.
Don't confuse add2::Int->Int->Int with a function
add (x,y)=x+y
that has ONE argument that is a pair of integers (a tuple), while add2 has TWO arguments of type
For instance, we write add(5,7) but add2 5 7.
A type with zero arity does not permit pattern matching in the corresponding function definitions. We
Com2010 - Functional programming; 2002
must use case or if expressions instead. Also, guards must be realised using if.
Let us consider the definitions
-- take first n elements
-- take 2 [1,2,3,4,5,6,7]  [1,2]
take _ []
= []
take 0 _
= []
take n (first:rest)
|n > 0 = first:take (n-1) rest
error “not a natural number”
-- take, via anonymous function;
-- using case
take =(\n->\list->
case (n, list) of
(_, [])->[]
(0, _) ->[]
(n, first:rest)->
if n > 0
then first:take (n-1) rest
else error "not a natural number"
-- take, via anonymous function;
-- using if
take= (\n list ->
if n==0 || length list==0
then []
else if n>0
then (head list:take (n-1)(tail list))
else error "not a natural number")
Arity n type definition
foo :: a1->a2 …->an->t
Function definition
foo p1 p2 … pn
| g1 = b 1
| gk = b k
Arity 1 type definition
foo :: a1->(a2 …->an->t)
Function definition
foo p1
| g1 = b 1
| gk = b k
Com2010 - Functional programming; 2002
Arity 0 type definition
foo :: (a1->a2 …->an->t)
Function definition
foo = b
18.2.2 Infix operators
For functions of arity 2 we can use infix notation. For
instance, it is more convenient to define function composition fcomp as a right-associative infix
(.)(g f) x=g (f x)
which is usually written as (g.f) in infix notation.
We may then write
eightTimes::(Int -> Int)
eightTimes=double . double . double
as abbreviation for
(.) double ((.) double double))
double . (double . double)
The brackets (.) around the infix operator make the compiler forget the infix status and make the
operator a
prefix. They are necessary when you refer to the operator itself, i.e. when you don't put it between its
In Prelude you may find the keyword infixr associated with this operator
infixr 9 .
which makes it a right associative infix operator with the highest binding strength, 9
There is also infixl for left associative infix
Infix syntax is possible only for functions of arity 2.
There are many infix operators in Haskell:
left associative:
!! * / `rem` `div` `mod`+ right associative:
. ^ ** ++ && ||
==, /=, < <= > >= `elem`
‘Non-associative’ means that the operator cannot be iterated. Expressions such as x<y<z produce a
compiler (interpreter) error; such expressions should be rewritten as x<y && y<z.
Note that all these infix operators can be used as functions and passed as arguments. For instance, we
may write (<) 70 8 instead of 70 < 8.
Any function of arity 2 may be used in both prefix and infix notations. Let us consider
add2:: Int->Int->Int
add2 x y = x+y
that may be used either as
add2 2 3
Com2010 - Functional programming; 2002
2 `add2` 3
18.2.3 Partial applications
The function ++ concatenates two strings:
(++)"Name=" "Bill"  "Name=Bill"
Its type is String -> String -> String. According to the rules of function application we
can also apply it to only one argument:
infixr 5 ++
The result (++) "Name = " is a one-argument
We may use this partial application ``on-the-fly'' as in the following:
-- prefix all elements in a list of
-- strings
prefAll names = map((++)"Name=")names
prefAll ["Bill","John","Tony"] 
In 16.1.4 we have used foldr with (+) and (*) also partially applied.
1. Define a function
that suffixes a string by "is the name." by specialisation of (++).
2. Define a function
that subtracts all the integer values in a given list from a specified value;
subtractAll 4 [2,3,5]  [2,1,-1]
3. Consider also the other way round:
subtractAll’ 4 [2,3,5] [-2,-1,1]
4. Define a function
that removes all the strings in a list of strings that are not equal to a given value (of type String).
remove “John” [“John”, “Thomas”,”John”,”Will”]  [“Thomas”,”Will”]
18.3 Iteration and primitive recursion
The composition operator, . in Haskell, is a simple example of how higher-order functions can capture
general computational patterns. We have seen how . as an operator on functions permits compact
definitions of functions, such as eightTimes.
We are going to explore a number of computational abstractions to illustrate the conciseness of higherorder programming.
18.3.1 Iteration
The function exp2::Int->Int could be mathematically defined (using double) as follows:
exp2(n)=2n=2*(2n-1)=2*exp2(n-1)= double(exp2(n-1)), n>0
We may define exponentiation with base 2 by a recursive computation
Com2010 - Functional programming; 2002
exp2 n
| n==0 = 1
| n>0 = double(exp2 (n-1))
(the last line may be also written as
| n>0 = 2*(exp2 (n-1))
| n>0 = (2*)(exp2 (n-1))
| n>0 = (*)2(exp2 (n-1))
where * is infix operator used as operator or function)
For every (positive) n the expression exp2 n iterates
the function double (or (*)2) n times, starting from initial value 1.
The process of iterating a function is very general:
iterates a function of type a->a starting with an initial value of type a, for a given number of steps
(type Int)
-- 1st param: interation function
-- 2nd param: initial value
-- 3rd param: iteration length
myIter f x n
| n==0 = x
| n>0 = f (myIter f x (n-1))
The polymorphic higher-order function myIter captures the abstract process of iteration.
Exponentiation, then, is obtained as a special case:
myExp2 = myIter double 1
Consider the problem of computing the exponent bn for arbitrary b. The mathematical definition of
such a function is
exp b n =bn, n>0; exp b 0 =1 – two parameters
All we do is replace double by the anonymous function \x->b*x which multiplies by b:
myExp b = myIter (\x->b*x) 1
Note the partial application: myIter defined above has 3 parameters. To get myExp b from it we
specialise two of them (f and x, the first two).
myIter is polymorphic. It can be used for other types, too. Suppose, we want to construct lists
[‘c’,...,’c’] that contain one and the same character ‘c’ repeatedly. We obtain this as a
specialisation of myIter:
repChar c = myIter (\x->c:x) []
Com2010 - Functional programming; 2002
Observation. The function myIter captures the recursive invocation of the same function (constant)
– multiplication with 2 or b or addition of the same element (‘c’) to a list - .
18.3.2 Primitive recursion
Often we do not want to construct n-fold iterations of one and the same function but of different
... fn-1
To capture this case conveniently we need a more general computation pattern.
Examples. 1. The factorial function fact(n)=n! may be obtained as
2. The list [1, 2, ..., n] consisting of the first n natural numbers may be obtained thus:
[1,2,… n]
Notice, at stage k we multiply be k, in the first example, or append k to the end of the list, in the
second case. Thus, the operation of each stage is different.
fact :: Int -> Int
-- construct n!
fact n
| n == 0 = 1
| n > 0 = fact(n-1) * n
natList :: Int -> [Int]
-- constructs initial seq of naturals
natList n
| n == 0 = []
| n > 0 = natList (n-1) ++ [n]
The general pattern that we extract from this is called primitive recursion:
primRec :: (Int -> a -> a) -> a -> Int -> a
-- primitive recursion
-- 1st param: iteration function,
-depending on iteration stage
-- 2nd param: initial value
-- 3rd param: iteration length
primRec f x n
| n == 0 = x
| n > 0
= f n (primRec f x (n-1))
primRec f x n=f n (primRec f x (n-1))=…
=f n (f (n-1)(… f 1 (primRec f x 0)…))=
=f n (f (n-1)(… f 1 x…))
which may be viewed as expressing an iteration with different functions.
Com2010 - Functional programming; 2002
Our fact and natList examples have the following
fn(fn-1…(f1 x)…) instantiations (partial application) of primRec:
myFact :: Int ->Int
myFact = primRec (\n x -> x*n) 1
myNatList ::Int -> [Int]
myNatList = primRec (\n x -> x++[n]) []
Exercise 5. Write the function successor (successor x = x+1) using primRec.
Most functions on natural numbers that you will come
across are primitive recursive. Thus, most functions can
be defined in principle using the simple computational
pattern primRec (though it may not be easy to find). There are functions which are not primitive
recursive. The following is an example
fAck:: Int -> Int -> Int
fAck 0 y = y+1
fAck (x+1) 0 = fAck x 1
fAck (x+1) (y+1) = fAck x (fAck(x+1) y)
called Ackermann’s function.
18.4 Efficiency of Recursion Patterns
In practice, finding the most efficient recursion pattern is
non-trivial and requires creative insights. Let us look at two examples next.
18.4.1 Example 1. Exponentiation
Exponentiation myExp b :: Int -> Int as defined before by iteration has linear time
complexity. The number of iterations of the basic operation * equals the argument n in myExp b n.
There is a more efficient way of doing exponentiation using the idea of successive squaring. For
instance, instead of computing
b8 as b  b  b  b  b  b  b  b
we can do with just three multiplications:
b2 = b  b; b4 = b2  b2; b8 = b4  b4
For arbitrary exponents we can use the recursive laws:
bn = bn/2  bn/2, n is even
bn = b  bn-1,
n is odd
Using the Prelude function even our efficient exponentiation is
fastExp :: Int -> Int -> Int
fastExp b n
| n == 0
= 1
| even n
= y * y
| otherwise = b * (fastExp b (n-1))
where y = fastExp b (n `div` 2)
Note. It is essential to define y!!
Compare these figures:
fastExp 2 100 (138 reductions, 194 cells);
100 (2119 reductions, 2524 cells)
myExp 2
100 (2221 reductions, 2727 cells)
As you can see, in contrast to simple iteration or primitive recursion fastExp reduces the recursion
variable n much faster in the recursive step. Primitive recursion or iteration only decrements n while
fastExp halves it (in most cases).
Com2010 - Functional programming; 2002
It follows that fastExp has only logarithmic time complexity. The number of multiplications done in
fastExp b n is bounded by 2 log2 n .
18.4.2 Example 2. Fibonacci sequence
Another source for potential inefficiency of recursive programs is that they may compute one and the
same result several times.
As an example take the Fibonacci sequence:
fib :: Int -> Int
fib n
| n == 0 = 0
| n == 1 = 1
| n > 1
= fib(n-2) + fib(n-1)
This program implements the recursive definition directly. Since fib n depends on both fib(n-1)
and fib(n-2), we must calculate both before we can compute fib n. In the recursive call for
fib(n-1), then, we are computing fib(n-2) and fib(n-3). Thus, we are computing fib(n2) twice.
The computation pattern for fib 4, therefore, looks as
fib 4
fib 3
fib 2
fib 2
fib 1
fib 1
fib 0
fib 1
fib 0
Com2010 - Functional programming; 2002
fib 3 is computed 1 time, fib 2 2 times, fib 1 3 times, fib 0 2 times. The size of the
computation tree grows exponentially in n.
It would be more efficient if we could share computation nodes, i.e. not recomputed results
fib 3
fib 4
fib 2
fib 1
fib 0
This computation tree now only grows linearly in size with the argument n.
How do we actually implement this?
The iterated function fibNextStep is
fibNextStep :: (Int, Int) -> (Int, Int)
fibNextStep (x, y) = (y, x + y)
from which we get a fast version of fib specialising the iteration pattern:
fastFib :: (Int -> Int)
-- fast Fibonacci sequence
fastFib = fst . (myIter fibNextStep (0,1))
Compare these figures!
fib 20
(399209 reductions, 471650 cells, 1 garbage collection)
fastFib 20
(417 reductions, 565 cells)
18.5 Partial Functions and Errors
More often than not functions are only partially defined. In practice, only few functions are meant to
be applied to all values of their input type. There are often some input values that ought not to occur,
for which the function's result is not defined or sensible.
Simple examples include
 attempts to divide by 0, to take the square root of
a negative number, or the head of an empty list.
 applying a function defined by (primitive) recursion on natural numbers to negative numbers
 applying a function to an input value that is not caught by any pattern or guard in any of the
function's definition equations (non-exhaustive patterns or guards).
Such exceptional situations may be handled in a number of different ways :
5. Exceptions
4. Dummy values
3. Program abortion
2. Run-Time Errors
1. Infinite loops
Com2010 - Functional programming; 2002
18.5.1 Infinite loops
Assuming that it is only applied to natural numbers (i.e. positive integers) the Fibonacci function might
be defined thus:
naiveFib :: Int -> Int
-- may loop
naiveFib 0 = 0
naiveFib 1 = 1
naiveFib n = naiveFib(n-1) + naiveFib(n-2)
When we try to execute naiveFib with argument -3
naiveFib (-3)  ERROR - Control stack overflow
... the machine loops and eventually runs out of (memory) control – stack overflow.
Obviously, this is not acceptable. Even if we are sure that we will never use naiveFib on negative
numbers we must make provisions. Who knows who else is going to use our function ...
Infinite loops typically also occur in generally recursive functions, or computations on lazy infinite lists
(next on this screen...)
18.5.2 Run-time Errors: missing conditions
The next best solution is to introduce guards that perform “health check” on the input. In this way we
can use the run-time system to detect if our function is used on unhealthy input. This leads to the
“standard” implementation of Fibonacci:
fib :: Int -> Int
-- may show run-time error
fib 0 = 0
fib 1 = 1
fib n
| n > 1 = fib(n-1) + fib(n-2)
Now if we apply fib outside its intended domain we get
fib (-3)  Program execution error: {fib (-3)}
The disadvantage with this is that the error message is not specific to the function fib. It is generated
by the run-time system with knowledge about the semantics of our program.
18.5.3 Program Abortion
Regarding the error messages we can do better using the built-in function error :: String->a
We proceed as follows:
newFib :: Int -> Int
-- may produce error message
newFib 0 = 0
newFib 1 = 1
newFib n
| n > 1 = newFib(n-1) + newFib(n-2)
| otherwise =
error ("\nError in newFib:Fibonacci "
++"function cannot \nbe applied to"++" negative integer "
++show(n) ++"\n")
Now we receive a more specific abort message:
fib(-3)Program execution error: Error in newFib: Fibonacci function
cannot be applied to negative integer -3
This is a good deal more useful since it reveals some information about the program situation in
which the error occurred.
Com2010 - Functional programming; 2002
However, this solution is not always ideal. The problem is that the error immediately aborts the userprogram. We escape the program and pass control to the run-time system to handle the errors.
The user program may want to handle the exceptional inputs itself, and thus have a chance to recover.
Two possibilities are discussed in the next sections.
18.5.4 Dummy Values
Sometimes the exceptional inputs can be covered by
defining natural dummy results.
Consider the Fibonacci sequence
0, 1, 1, 2, 3, 5, 8, …
again. It would appear natural to extend the sequence into the negative indices by repeating 0:
…, 0, 0, 0, 0, 1, 1, 2, 3, 5, 8, …
i.e. we define
fib n = 0
for negative n. So, 0 would be the dummy result for negative inputs.
This would give us the following implementation:
extFib :: Int -> Int
-- extends 0's leftwards
extFib 0
= 0
extFib 1
= 1
extFib n
| n > 1
= extFib(n-1) + extFib(n-2)
| otherwise = 0
Now extFib is completely defined for all its inputs. No hard program error will ever occur from
applying extFib.
Whether or not this extension of fib is a good one depends on the circumstances. Although we do not
get a hard program error, the program might still go astray. The difference is just that now we may not
notice this immediately.
If the application depended on the characteristic recursive property of the Fibonacci sequence, i.e. that
the equation
fib(n) = fib(n-1) + fib(n-2), nZ
held across all inputs, extFib would not be the right extension; let us consider extFib 1:
1=extFib 1  extFib 0 + extFib (-1)=0+0=0.
The “right” extension, which does satisfy (1) would be
…,-8,5,-3,2,-1,1, 0, 1,1,2,3,5,8,…
which is coded as follows:
symFib :: Int -> Int
-- satisfies recursion law (1)
symFib 0
= 0
symFib 1
= 1
symFib n
| n > 1
= symFib(n-1) + symFib(n-2)
| otherwise = symFib(n+2) - symFib(n+1)
18.5.5 Exception Handling
To trap and process errors the user program may employ an explicit exception handling technique
based on error types defined using algebraic types (paragraph 14.2). In paragraph 14.3 the
polymorphic enumerated type Maybe a has been defined as being
data Maybe a = Nothing | Just a
deriving (Eq, Ord, Read, Show)
Com2010 - Functional programming; 2002
The type Maybe a is simply the type a extended by an error value Nothing, that is used when an
error is detected. The result of o function is not the original intended type a (a for my_nth or
String for pget, see 14.3) but Just a instead.
Any function g that uses the result of a function like my_nth must be transformed so it accepts an
argument of type Maybe a rather than a. This is where the error handling occurs. We can
 transmit the error through g, pass it on to the next function up
 trap the error within g.
If we wish to transmit the error value we can use the function mapMaybe. It lifts function
g :: a -> b
to a function
mapMaybe g :: Maybe a -> Maybe b,
so that it operates on the type Maybe a:
Transmitting (mapping) an error
Maybe b
Maybe a
mapMaybe g :: Maybe a -> Maybe b
mapMaybe :: (a->b) -> Maybe a-> Maybe b
mapMaybe g Nothing = Nothing
mapMaybe g (Just x) = Just (g x)
mapMaybe (*3) (my_nth 5[1,2,3])  Nothing
mapMaybe (*3) (my_nth 2[1,2,3])  Just 9
The function (*3)::Int->Int has been lifted to
mapMaybe (*3)::Maybe Int -> Maybe Int.
If, however, we lift the function g::a -> b to a function of type Maybe a -> b then we are
trapping the error. We are providing a dummy output value dummy of type b for the error input
Trapping an error
Maybe a
trapMaybe dummy g :: Maybe a -> b
trapMaybe :: b->(a->b) -> Maybe a-> b
trapMaybe dummy g Nothing = dummy
trapMaybe dummy g (Just x) = g x
Com2010 - Functional programming; 2002
Typically, we combine both mapping and trapping. With mapMaybe we pass up the error, from the
place where it occurred, to some outer-level function, where it is trapped using trapMaybe.
if dummyInt of type Int has the value 999999999 then:
trapMaybe dummyInt (1+)
(mapMaybe (*3)(my_nth 5[1,2,3]))
 my_nth returns error (Nothing)
trapMaybe dummyInt (1+)
(mapMaybe (*3) Nothing)
 error passed up by mapMaybe (*3)
trapMaybe dummyInt (1+) Nothing
 trapped by trapMaybe and results dummyInt
when no error occurs then it follows:
trapMaybe dummyInt (1+)
(mapMaybe (*3)(my_nth 2[1,2,3]))
 my_nth returns proper result (Just 3)
trapMaybe dummyInt (1+)
(mapMaybe (*3) (Just 3))
 multiplication under Just
trapMaybe dummyInt (1+) (Just 9)
 exit from error handling
(1+) 9  10
The advantage of this approach is that we have full control over error handling. We may enter a
controlled failure mode or take recovery measures if possible.
18.6 More Higher-Order on Lists
Apart from iteration and primitive recursion, generally useful and reusable computational pattern on
integers are difficult to identify. The data type of numbers is simply too rich. Each problem requires its
own new recursion pattern.
On lists, however, a host of polymorphic functions exist that can be fruitfully reused in many
applications. We introduce a few more of them next.
18.6.1 Functions zip, unzip, zipWith
The built-in functions zip and unzip convert between pairs of lists and lists of pairs:
zip ::
([a],[b]) -> [(a,b)]
-- zip together two lists
unzip :: [(a,b)] -> ([a],[b])
-- unzip a list of pairs
zip ([85,3,0], ["VW","Rover","Lada"])
zip ([1,2,3], ['d'])  [(1, 'd')]
unzip [("Mark",39),("David",24),("Rob",54)]
 (["Mark","David","Rob"],[39,24,54])
 our zip function above defined is a slightly modified version of the one you may find in Prelude!!
 zip drops overhanging elements
Com2010 - Functional programming; 2002
 zip and unzip are “inverse”:
zip (unzip lp) = lp
unzip (zip pl) = pl
provided both lists in lp are of equal length.
The recursive definitions of zip and unzip are as follows:
zip ::
([a],[b]) -> [(a,b)]
zip ([], _)
= []
zip (_, [])
= []
zip (x:xs, y:ys) = (x, y) : zip (xs, ys)
unzip :: [(a,b)] -> ([a],[b])
unzip []
= ([], [])
unzip ((x, y):ps) = (x:fst (unzip ps),y:snd (unzip ps))
Question 1. Can you do zip with only 2 patterns?
zipWith is a generalisation of zip that zips together the elements of two lists using an arbitrary
zipWith :: ((a, b) -> c)-> ([a],[b]) -> [c]
zipWith f ([], _)
= []
zipWith f (_, [])
= []
zipWith f (x:xs,y:ys) = f(x,y):zipWith f (xs,ys)
Note. zipWith defined above is not exactly the version you may find in Prelude.
Exercise 6. Show how to define zipWith from zip and map!
18.6.2 Functions takeWhile and dropWhile
Recall the list selection functions !! and filter
(!!) :: [a]-> Int -> a
-- select indexed element; first element has index 0
filter :: (a -> Bool)-> [a] -> [a]
-- selects sub-list of elements satisfying given predicate
[5,8,3,7] !! 2  3
filter isEven [5,8,3,7,4]  [8,4]
isEven = (\n->(n `mod` 2 == 0))
The Haskell built-ins
takeWhile :: (a -> Bool) ->[a] -> [a]
dropWhile :: (a -> Bool) ->[a] -> [a]
provide two further variants of list selections; takeWhile pred list starts at the beginning of the
list list and takes elements from list while the selection predicate pred is true. For instance,
takeWhile isEven [2,4,6,7,2,2] 
takeWhile isEven [1,4,5]  []
Its recursive definition is
takeWhile :: (a -> Bool) -> [a] -> [a]
-- take elements while predicate is true
takeWhile p []
= []
takeWhile p (x:xs)
| p x
= x : takeWhile p xs
| otherwise = []
Com2010 - Functional programming; 2002
dropWhile is similar, except that it dropping rather than
picking elements:
dropWhile isEven [2,4,6,7,2,2]  [7,2,2]
dropWhile isEven [1,4,5]  [1,4,5]
dropWhile isEven [2,8,6]  []
Here is its recursive definition:
dropWhile :: (a -> Bool) -> [a] -> [a]
-- drop elements while predicate is true
dropWhile p []
= []
dropWhile p xs @(x:xs')
| p x
= dropWhile p xs'
| otherwise = xs
where @ is read as ‘as’ and identifies xs and x:xs’ as being the same.
19 Algebraic Data Types
19.1 What is an algebraic type?
19.2 Algebraic Types, More Systematically
19.2.1 Enumeration
19.2.2 Product
19.2.3 Nested
19.2.4 Recursive
19.2.5 Polymorphic
19.3 General syntax
We have seen many built-in data types:
primitive data types:
Int, Float, Bool, Char, ….
composite data types
(Int, String), [Int], String …
In typed functional programming languages a large class of other complex user-defined data types
can be constructed. These are called algebraic data types. Please remember in chapter 14 two
algebraic data types are used, enumerated and polymorphic enumerated type Maybe a and in chapter
17 RegExp, Automaton and others are introduced. Tuples, lists and Strings are other examples of
algebraic data types. Algebraic types are introduced by the keyword data, followed by the name of the
type, = and then the constructor(s). The type name and the constructor(s) must start with an upper case
19.1 What is an algebraic type?
We define the structure of our data type by specifying how its elements are constructed in terms of a
finite number of rules.
19.1.1 Construction
Consider the example
data Pres = Result String | Fail
Elements of this type:
Fail :: Pres
Result “Green”:: Pres
Result “m.gheorghe”::Pres
Definition (roughly): A type is algebraic if every element can be constructed and deconstructed
uniquely using a finite number of predefined constructors.
The type definition
Com2010 - Functional programming; 2002
data Pres = Result String | Fail
introduces the following constructors
Result :: String -> Pres
Fail :: Pres
Result is like a function but with no equation definition.
In the next example:
map Result["a","b"] [Result "a",Result "b"]
Result:used as a function which is mapped into the list.
The type Pres is defined by the following
(1) if expr has type String then Result expr has type Pres
(2) Fail has type Pres
(3) all elements of type Pres are obtained by (1) and (2)
Since every element of Pres is built up from the constructors
Result :: String -> Pres and Fail :: Pres
it can also be deconstructed again. This makes it possible to define functions
f :: Pres -> X
for arbitrary type X by structural analysis of f's function arguments. This is called the ...
19.1.2 Structural Decomposition Principle
To define f x, where x :: Pres, deconstruct x into its components, and define f x from these
(simpler) components.
Rule of Thumb: One equation definition for each constructor.
print’ :: Pres -> String
print’ (Result x) = "Result " ++ x
-- equation 1 (pattern Result x)
print’ Fail
= "Fail"
-- equation 2 (pattern Fail)
For every element of type Pres exactly one deconstruction pattern matches.
Consequently equations 1 + 2 define a unique result value print’ res for all elements res
of Pres.
19.1.3 Patterns
Patterns are used for deconstructing elements of an algebraic type. They are just like elements of this
type, i.e. the same typing rules apply, but constructed from
 basic values: these are all constants of types
String, Bool, Char, Int, Float
 variables: identifiers starting with lower case letters
 wildcard _: this is an anonymous variable for a sub-expression
 as-patterns: they occur in the form [email protected]
Here are some example patterns for type Pres:
Result “m.gheorghe”
Result x
Result _
[email protected](Result x)
How is the as-pattern [email protected](Result x)matched against a value val?
(1) match Result x against val
(2) if successful, i.e. if val is of the form Result s for some s, bind variable x to s and v to the
whole value val.
Let us consider the definition below for the function onlyResult which shows only patterns of the
form Result s.
Com2010 - Functional programming; 2002
onlyResult :: Pres->String
onlyResult [email protected](Result x) = show v
onlyResult Fail = error "not Result pattern introduced"
when use
onlyResult (Result “string”)
then Result x is matched against Result “string” and being successful, x is bound to
“string” and v to the whole Result “string” which is shown. Please note that in
onlyResult we may use Result _ instead of Result x.
Here are some more examples…
Result x matches -- basic value
Result “21843”
Result ”m.gheorghe”
with bindings
x = “21843”
x = ”m.gheorghe”
does not match Fail
matches -- variable
Result “21843”
Result ”m.gheorghe”
with bindings
x = Result “21843”
x = Result ”m.gheorghe”
x = Fail
matches anything –wildcard; with NO bindings
Result “m.gheorghe” matches only
Result “m.gheorghe” and nothing else
[email protected](Result x) matches – as pattern
Result “21843”
Result ”m.gheorghe”
with bindings
x = “21843”;
v = Results “21843”
x = ”m.gheorghe”
v = Result “m.gheorghe”
does not match Fail
How do we evaluate a function application f e where function f is defined by the equations
f pattern_1 = body_1
f pattern_n = body_n ?
(1) Evaluate e as far as necessary for the following:
(2) find the first pattern_i that matches the value of e. This generates instantiations (bindings) for
the variables occurring in pattern_i.
(3) evaluate the definition body body_i with the bindings produced by the match.
Note. Patterns may be
1. overlapping - they are evaluated in order. The first pattern that matches is taken:
isOK :: Pres -> Bool
isOK (Result _) = True
-- matched first
Com2010 - Functional programming; 2002
isOK _
= False
-- only when pattern ‘Result _’ fails
2. non-exhaustive - if no pattern matches, then we get a
run time error:
prop :: Pres -> String
prop (Result x) = x
-- no pattern for constructor `Fail'
Then we get:
prop Fail
-- does not match
Program execution error: {prop Pres_Fail}
Caveat: don't forget brackets in prop (Result x)!
An algebraic type defines a collection of data that are formed according to the same set of structural
rules. They are
 generated by a fixed and finite set of constructors
 deconstructed by pattern matching
 permit function definition by structural decomposition.
19.2 Algebraic Types, More Systematically
Let us look at a number of examples of algebraic data types. We will become familiar with
 alternative
 compound
 nested
 recursive
 polymorphic
structure, and from this work towards a general construction scheme.
19.2.1 Alternatives: Enumeration Types
In an enumeration type all constructors are constants, i.e. don't depend on parameters...
Type definition
data Temp = Cold | Hot
data Season = Spring | Summer | Autumn | Winter
Type constructors:
Com2010 - Functional programming; 2002
We can define
weather :: Season -> Temp
weather Summer = Hot
weather _
= Cold
-- ordering important!
isEqual :: Temp-> Temp -> Bool
isEqual Cold Cold = True
isEqual Hot Hot = True
isEqual _
= False
-- last pattern subsumes all remaining
-- cases; again, ordering important!
19.2.2 Compound: Product types
Product types are algebraic data types with one constructor that has many parameters.
Type definition
data People = Person String Int Int
An element of this type is
aPerson :: People
aPerson = Person "M Gheorghe" 111 21843
Person is the constructor of this type:
An alternative way of defining People type is
Person Name Office TelNo
Like for Pres type, the constructors introduced by an algebraic type definition can be used as
functions; consequently Person st o t is the result of applying function Person to the
arguments st, o and t.
Person :: Name -> Office->TelNo->People
An alternative definition of type People is given by the type synonym
type People =(Name, Office, TelNo)
There are some advantages of using algebraic data types:
Com2010 - Functional programming; 2002
Each object of the type carries an explicit label, in the above cases Person
It is not possible to accidentally treat an arbitrary string and two integers as a person; a person
must be constructed using the constructor Person
 The type will appear in any error message due to mistyping
There are also advantages of using a tuple type, with a synonym declaration:
 The definition is more compact and so definitions will be shorter and easier to manipulate
Using a tuple, especially a pair, allows us to reuse many polymorphic functions such as fst,
snd and unzip over tuples types; this will be not the case with the algebraic types
The examples of types given here are special cases of what we look next…
19.2.3 Nested Algebraic Data Types
Type constructions can be nested each other (remember also Automaton and Transition in
chapter 17):
Type definition
data Employees
type Name
data Dates
type Day
data Month
|Jun | Jul |
type Year
data Gender
= Employee Name Gender Dates
= String
= Date Day Month Year
= Int
= Jan | Feb | Mar | Apr |May
Aug | Sep | Oct | Nov | Dec
= Int
= Male | Female
Com2010 - Functional programming; 2002
Elements of type Employees look like:
anEmployee :: Employees
anEmployee =
Employee “Simon” Male (Date 1975 Jun 14)
We can access their components through nested-patters, and sub-patterns:
inJune :: [Employees]->[Dates]
-- returns all male birthday dates in
-- June
inJune []
= []
inJune (Employee _ Male [email protected](Date _ Jun _):es
= d:inJune es
inJune _:es = inJune es
Caveat. If you want to show the results obtained you must add deriving Show to both Dates and
inJune [anEmployee]  [Date 1975 Jun 14]
How is the as-pattern
[email protected](Date _ Jun _)
matched against a value someDate::Dates?
match someDate against Date _ Jun _
if the second component has the value Jun, bind the variable d to the whole someDate
Date 1975 Jun 14
[email protected](Date _ Jun _)
then d binds Date 1975 Jun 14
Now observe the nesting for the patterns in our example. When
inJune is applied to the list
[Employee “Simon” Male (Date 1975 Jun 14)]
it follows that the second equation is chosen:
inJune (Employee _ Male [email protected](Date _ Jun _):es
Employee “Simon” Male (Date 1975 Jun 14)
matches against
Employee _ Male [email protected](Date _ Jun _)
Date 1975 Jun 14
is a sub-pattern matching against
[email protected](Date _ Jun _)
We have seen so far three construction mechanisms:
Com2010 - Functional programming; 2002
Com2010 - Functional programming; 2002
What about loops? Type is defined in terms of Type too!!
In this case we would have something like
data Type
= C_cons Type_1 Type_2
data Type_1 = A_cons Type | B_cons Type
19.2.4 Recursive Types
-- recursive type of simple expressions
data Exp = Lit Int| Add Exp Exp| Sub Exp Exp
Construction rules
(1) if n has type Int then Lit n has type Exp
(2) if e_1 and e_2 have type Exp then Add e_1 e_2 has type Exp
(3) if e_1 and e_2 have type Exp then Sub e_1 e_2 has type Exp
(4) all elements of Exp are obtained by (1)-(3)
Examples of expressions:
Lit 2
Add (Lit 2) (Lit 3)
(3-1)+4 Add (Sub (Lit 3) (Lit 1)) (Lit 4)
We define functions on type Exp by recursive pattern matching. Consider the example:
eval :: Exp->Int –- evaluate expressions
eval (Lit n) = n
eval (Add e1 e2) = (eval e1) + (eval e2)
eval (Sub e1 e2) = (eval e1) – (eval e2)
Com2010 - Functional programming; 2002
Each time eval calls itself the expression has been deconstructed. Given e is finite, eval must
eventually bottom out, when it has completely decomposed its arguments. Could it be e, in eval e,
an infinite expression?
19.2.5 Polymorphic types
The standard example of a recursive polymorphic type is the type list. In Chapter 15 another important
(recursive) polymorphic type was introduced, binary searching tree.
Type definition
data Tree a = Empty | Leaf a | Node a (Tree a) (Tree a)
Tree a is a family of types, parameterised by type variable a. Specific instances are Tree Int,
Tree String, or Tree (Tree Int).
Here is an example of a Tree String (look at the order of elements in the binary tree).
strTree :: Tree String
strTree = Node “leaf”
(Node “butterfly”
(Leaf “apple”)
(Node “face”
(Leaf “clown”)
(Node “pumpkin”
(Node “mouth”
(Leaf “party”))
(Leaf “sponge”))
And a tree of integers:
Com2010 - Functional programming; 2002
intTree:: Tree Int
intTree = Node 5
(Node 2 (Leaf 1)
(Node 4 (Leaf 3) Empty))
(Node 8 (Node 6 Empty (Leaf 7))
(Leaf 9))
Many useful polymorphic functions can be defined for Tree a uniformely for all a, just by recursion
on the tree structure. An example introduced in Chapter 15 is
traverse :: Tree a -> [a]
-- traverse intTree = [1,2,3,4,5,6,7,8,9]
traverse Empty = []
traverse (Leaf x) = [x]
traverse (Node x left right) =traverse left ++ [x] ++ traverse right
Another polymorphic function on binary searching trees is
removeLast :: Tree a -> (a, Tree a)
-- split off last element from a nonempty tree
removeLast (Leaf x)
Com2010 - Functional programming; 2002
= (x,Empty)
removeLasT (Node y t_1 Empty)=(y,t_1)
removeLast (Node y t_1 t_2) = (x, Node y t_1 t_3)
where (x,t_3)=removeLast t_2
We can join binary searching trees, keeping balance and order:
joinTree :: Tree a-> Tree a->Tree a
joinTree Empty t = t
joinTree (Leaf x) t = Node x Empty t
joinTree t_1 t_2
= Node y t_3 t_2
where (y,t_3) = removeLast t_1
The idea is that the last node of t_1, if any, is removed and it becomes the root of the tree having as
left sub-tree the second component of removeLast result and right sub-tree the second tree, t_2
Could you explain why traversejoinTree t_1 t_2)=traverse t_1++traverse t_2?
Other polymorphic functions impose constraints on the type variable a. From Chapter 15 we have:
tree_member :: Ord a => a->Tree a -> Bool
tree_insert :: Ord a => a->Tree a -> Tree a
For searching, the following polymorphic functions are useful:
-- turns a
:: Ord a =>[a] -> Tree a
list into an ordered search tree
[] = Empty
(x:xs) =
tree_insert x (listToTree xs)
Com2010 - Functional programming; 2002
treeSort :: Ord a => [a] -> [a]
-- sorts a list via ordered tree
treeSort xs = traverse(listToTree xs)
treeSort [2,91,7,35,28]  [2,7,28,35,91]
treeSort['a','r','k',' ','9','i']  [‘ ’,’9’,’a’,’i’,’k’,’r’]
treeSort[[4,1],[3,9,5],[3],[9,1,0]]  [[3],[3,9,5],[4,1],[9,1,0]]
This illustrate the power of polymorphic data types and polymorphic programming.
19.3 General Syntax for Algebraic Data Types
The general definition of an algebraic type has the form:
data TypeName a_1 a_2 … a_n =
ConstructName_1 T_(1,1) T_(1,2)…T_(1,k1)
|ConstructName_2 T_(2,1) T_(2,2)…T_(2,k2)
|ConstructName_m T_(m,1) T_(m,2)…T_(m,km)
where TypeName is the name of the new polymorphic algebraic type defined with n (n0) type
parameters a_1, a_2, … a_n. The elements of this type are built from m (m1) constructors
named ConstructName_1, ConstructName_2, … ConstructName_m.
ConstructName_i takes ki (ki0)arguments of types T_(i,1), T_(i,2), …T_(i,ki).
The type expressions T_(i,j) may contain arbitrary predefined types, as well as the type variables
a_1, a_2, … a_n, and the type TypeName itself.
All type variables occurring in any of the types T_(i,j) must be listed among the a_1, a_2,
… a_n
All constructor names ConstructName_i, must be different
Constructor names must start in upper case
20 Lazy Programming
20.1 Lazy Evaluation
20.2 Constructing Infinite Lists
20.3 List Comprehensions
20.4 List Comprehensions. Examples
20.5 Application: Regular Expressions
In this part we will say something about the evaluation strategy used in Haskell. Haskell is a
lazy functional programming language: arguments of a function are only evaluated when
they are needed to calculate the result of the function. This is in contrast to eager functional
languages that evaluate every argument of function before the function is applied. An
example of an eager language is ML, which was originally developed at the University of
The laziness of Haskell affects the programming style. It permits extensive use of infinite
data structures.
20.1 Lazy Evaluation
In functional languages an expression e is evaluated by successively rewriting it using equations until
Com2010 - Functional programming; 2002
the result (i.e. a value) v is obtained
 e_1 -- evaluation step 1
 e_2 –- evaluation step 2
 v
-- final value
The equations can be
 definition equations of functions, or
 beta-reductions (= uniform substitution)
The difference between lazy and eager evaluation is the strategy according to which equations are
Let us use =l for lazy and =e for eager evaluation.
Consider a switch function generalising the conditional if statement:
n > 0
switch n e_1 e_2
switch :: Int -> a -> a -> a
switch n x y
| n > 0
= x
| otherwise = y
Lazy evaluation exploits the fact that the result of switch n e_1 e_2 only depends on one of the
two arguments e_1, e_2:
switch (5-2) (2+7) (7^11)
-- evaluate switch arguments
=l switch 3 (2+7) (7^11)
-- 3 > 0, so 1st input argument is chosen
=l 2+7 = 9
In switch n e_1 e_2 we must always evaluate n, but only one of the arguments e_1, e_2 need to be
evaluated. Eager evaluation always evaluates the arguments regardless whether they are needed.
switch (5-2) (2+7) (7^11)
-- evaluate all arguments
=e switch 3 9 1977326743
-- evaluate switch function
=e 9
Thus =e does more work than necessary!
Similarly, in lazy evaluation we do not always fully evaluate all parts of the data structure of an
argument. We only evaluate those parts that are needed.
Com2010 - Functional programming; 2002
When we compute the head of a long list we do not need to evaluate all its elements:
-- identify 1st element of the list
=l head (1^1:[2^2,3^3,4^4,5^5])
-- extract it
=l 1^1
-- evaluate
=l 1
In eager languages we compute the argument fully before applying the function:
-- fully evaluate the argument
=e head (1:4:9:16:25:[])
-- apply head
=e 1
A drastic example for this difference is when you have (for whatever reason) a non-terminating subcomputation. Consider
loop :: Int -> Int
loop n = loop (n+1)
Evaluating loop n, eagerly or lazily, sets off an infinite computation:
loop 0
loop 0
loop (0+1)
=e loop 1
loop (0+1+1)
=e loop 2
loop (0+1+1+1)… =e loop 3 …
However, in lazy systems loop may exist as a sub-expression without forcing the overall computation
to diverge:
head [3-1, loop 0]
head[3-1,loop 0]
=l head ((3-1):[loop 0]) =e head (2:(loop 1):[])
=l 3-1
=e head (2:(loop 2):[])
=l 2
=e head (2:(loop 3):[])
-- terminates
-- loops
Now, we do not usually have non-terminating computations, but we may have infinite data
Every recursive algebraic data type in Haskell admits infinite objects. Here we will study the use of
infinite lists.
Example. In Haskell the infinite list of square numbers
[n2, (n+1)2, (n+2)2, (n+3)2,…]
starting at a given index n, can be defined thus:
squares:: Int -> [Int]
squares n = n^2: squares (n+1)
When we evaluate squares we get a continuous print out of square numbers:
squares 1 
until at some point where we reach the maximal representable integer number
… 2147210244,2147302921,2147395600,-2147479015,
where things get a bit out of hand (for 46340^2, 2147395600 is returned, but for 46341^2,
–2147479015 is returned (!!))
Com2010 - Functional programming; 2002
Disregarding the finiteness of our number representation, we may think of squares 1 as a process
that generates the list of all square numbers, as many as we like.
Most of the time we need only a finite number of them, anyway.
The following Haskell function is useful. It allows us to extract the first n elements of a (possibly
infinite) list:
myTake :: Int -> [a] -> [a]
-- take the first n elements
myTake 0 _ = []
myTake n (x:xs) = x:myTake (n-1) xs
myTake _ [] = []
Note 1. If n < 0 then myTake n xs returns xs.
Note 2. In Prelude you may find a slightly different version of this function.
We can use myTake to extract the first n square numbers from squares.
myTake 5 (squares 1)
=l [1,4,9,16,25]
myTake 9 (squares 6)
=l [36,49,64,81,100,121,144,169,196]
Here is how Haskell evaluates
myTake 2 (squares 1)
evaluate to decide
whether n is 0
evaluate to match
pattern x:xs
myTake 2 1^2:squares (1+1)
expand function definition
1^2:myTake (2-1) (squares (1+1)
1:myTake 1 ((1+1)^2:squares (1+1+1))
1:(1+1)^2:myTake (1-1) (squares (1+1+1))
no need to expand 2nd arg
1:4:myTake 0 (squares (1+1+1))
Com2010 - Functional programming; 2002
Note. In boldface are represented those elements that are processed in the current step.
20.2 Constructing Infinite Lists
Let us look at a few built-in functions to construct infinite lists. Haskell has convenient syntactic
abbreviations for arithmetic series:
[1..]  [1,2,3,4,5,6,7,8,9,10 …
[1,3..]  [1,3,5,7,9,11,13,15,17,19…
[80,60..]  [80,60,40,20,0,-20,-40,-60,-80,-100…
We may give upper bounds to make the lists finite:
[1..7]  [1,2,3,4,5,6,7]
[1,3..16]  [1,3,5,7,9,11,13,15]
[80,60..1]  [80,60,40,20]
Note. Both upper and lower bounds may be arbitrary expressions.
Another method of constructing general series is the higher-order built-in Haskell function
iterate :: (a -> a) -> a -> [a]
iterate f x = x : iterate f (f x)
iterate iterates a given function starting from an initial value, and lists all values produced by this
((+)1) 3  [3,4,5,6,7,8,9…
((*)2) 1  [1,2,4,8,16,32,64…
(\x->x `div` 10) 56789  [56789,5678,567,56,5,0,0…
(\x->1) 0  [0,1,1,1,1,1,1…
To construct new lists from existing ones the functions
:: (a -> b) -> [a] -> [b]
filter :: (a -> Bool) -> [a] -> [b]
may be used. They me be applied for infinite lists too.
Suppose we wish to construct the graph of a (total) function f, i.e. the list of all pairs (x, f x).
This is also called a function table for f. When f is defined over positive integers we can enumerate its
domain with [0..]:
mkGraph :: (Int -> a) -> [(Int,a)]
-- constructs function table
mkGraph f = map(\n->(n, f n))[0..]
mkGraph id  [(0,0),(1,1),(2,2),(3,3)…
where id x = x, is the identity function.
All the even positive numbers may be obtained from positive integers by using filter function
filter (\n-> n `mod` 2 == 0)[0..]
In chapter 17 we came across a very neat way of constructing lists through
20.3 List Comprehensions
Using list comprehension the list of all Pythagorean triples can be defined in Haskell as follows:
pythagTriples :: [(Int,Int,Int)]
pythagTriples =
[(x,y,z) | z<-[2..], y<-[2..z-1], x<-[2..y-1], x*x+y*y==z*z]
Com2010 - Functional programming; 2002
pythagTriples =
Note 1. The triples (x,y,z) in the above relation satisfy also x<y and y<z which involve the
uniqueness of these triples in the infinite list.
Note 2. The notation for list comprehension is quite analogous to the standard mathematical notation
for defining (finite as well as infinite) sets:
pythagTriples = {(x,y,z)| z{2,3,…}, y{2,3,…z-1}, x{2,3,…y-1},z2=x2+y2}
Following this analogy we may read the expression:
pythagTriples =
[(x,y,z) | z<-[2..], y<-[2..z-1], x<-[2..y-1], x*x+y*y==z*z]
as follows: “The list of all triples (x,y,z), where z is taken from [2..], y from [2..z1] and x from [2..y-1], such that x*x+y*y==z*z is true”.
Please note that this (infinite) list is based on an infinite list [2..], and two finite ones [2..z-1],
and [2..y-1].
General syntax:
The typical form of a comprehension expression is
[t | x_1<-e_1, x_2<-e_2, …,x_n<-e_n, cond_1,…cond_k]
which denotes the list of all elements of the form t, where x_i is taken from e_i, for all
i=1..n such that cond_j are true for all j=1..k.
pythagTriples may be rewritten as
[(x,y,z) | z<-[2..], y<-[2..z-1],
x<-[2..z-1], x<y, x*x+y*y==z*z]
Each x_i is a variable and each e_i is a list expression
t is an arbitrary expression that may contain x_i as variables
cond_j is called guard – it is a Boolean predicate – which may contain the variables x_i
20.4 List Comprehensions. Examples
List comprehension has been defined in the context of infinite lists. The concept applies equally to
finite lists too.
[(x,y)|x<-[1,2,3],y<-[1,2]] [(1,1),(1,2),(2,1),(2,2),(3,1),(3,2)]
It follows from this example that the variable x and y behave like nested loop variables in an
imperative programming language.
20.4.1 The Graph of a Function
Using comprehension notation the graph of a function is defined in a much clearer way:
mkGraph’ :: (Int -> a) -> [(Int,a)]
-- constructs function table
-- this time via comprehension
mkGraph’ f = [(x, f x) | x <- [0..]]
Compare with
mkGraph :: (Int -> a) -> [(Int,a)]
-- constructs function table
mkGraph f = map(\n->(n, f n))[0..]
Com2010 - Functional programming; 2002
mkGraph' id  [(0,0),(1,1),(2,2),(3,3)…
20.4.2 Zipping streams
Haskell has a built-in function zipWith which zips two lists through a function f that combine
corresponding values of these lists.
zipWith :: (a->b->c) -> [a]->[b]->[c]
f (a:as) (b:bs) =
f a b : zipWith f as bs
zipWith _ _
= []
For two lists
[a_0,a_1,a_2,…] [b_0,b_1,b_2,…]
and a function f defined for corresponding values of these lists f a_i b_i, zipWith produces
[f a_0 b_0, f a_1 b_1, …]
Comprehension makes it easy to define this function:
myZipWith :: (a->b->c) -> [a]->[b]->[c]
-- zips together two lists using a given function
myZipWith f as bs = [ f (as!!n) (bs!!n) | n<-[0..]]
For instance,
myZipWith (-) [2..][1..]) [1,1,1,1…
myZipWith (+) [2..][1..]) [3,5,7,9…
What is the result returned by
myZipWith (*) [1..] [1..]
20.4.3 Restructuring Streams
Suppose we have a list
and we want to restructure it forming internal pairs:
This is done as follows:
pairUp :: [a] -> [(a,a)]
pairUp as=myZipWith (\x y->(x,y))[as!!n|n<-[0,2..]][as!!n|n<-[1,3..]]
pairUp :: [a] -> [(a,a)]
pairUp as = [(as!!n,as!!(n+1))| n<-[0,2..]]
For instance
pairUp [1..] [(1,2),(3,4),(5,6),(7,8)…
Note. pairUp does not work for finite lists!
20.5 Application: Regular Expressions
In chapter 17 we presented some mechanisms for recognising words – i.e. sequences of symbols over a
given alphabet - (finite state machines) or for translating them into other things (translators based on
finite state machines or context-free grammars). We shall see now how to specify words over a given
alphabet by using regular expressions. We shall also give a Haskell definition of a regular expression
specification and show how to match a word against a regular expression specification.
Com2010 - Functional programming; 2002
20.5.1 Regular Expressions
A regular expression (RE) is a pattern which can be used to describe words of various kinds, such as
 the identifiers of a programming language – words containing alphanumeric characters which
begin with an alphabetic character
 the numbers – integer or real – of a programming language; and so on
A RE contains symbols of a given alphabet and some other meta-symbols.
There are five sorts of REs
this is the Greek character epsilon, which matches the empty word
x is any symbol; this matches the symbol itself
r1 and r2 are regular expressions; meaning‘or’
r1 and r2 are regular expressions; meaning ‘++’, i.e. r1 then r2
r is a regular expression; meaning repetition, i.e. r taken 0 or many times
Examples of REs include
1. (‘a’|(‘b’’a’))
2. ((‘b’’a’)|(|(‘a’)*),
if we consider them over a set of symbols containing ‘a’and ‘b’. The set of words (sequences of
symbols) defined by these REs may be read as the set of:
1. sequences containing one ‘a’, one ‘b’ followed by one ‘a’
2. sequences with one ‘b’ followed by one ‘a’ or empty word or sequences containing zero or
many occurrences of ‘a’.
REs may be also used to specify a set of words that are recognised by a FSM. The example below
presents a FSM introduced in chapter 17:
The language accepted by this FSM may be obtained by following all the paths starting from the initial
state, 0, and ending in a final state, either of 3 or 7. Mathematically this can be defined as the set
{‘a’’b’(‘c’’b’)n‘ a’(’b’)m| n, m 0}{‘b’’a’’c’}{‘b’’c’’a’}
This language may be specified by using the following equivalent notation which is a RE
Com2010 - Functional programming; 2002
20.5.2 Haskell Definition of a RE
A Haskell type representing REs over the set of all characters is given by
data RegExp =
Literal Char|
Or RegExp RegExp|
Then RegExp RegExp|
Star RegExp
where Epsilon stands for ; Literal Char stands for any character (i.e. ‘a’, ‘8’); Or
RegExp RegExp represents (r1|r2), for any REs r1 and r2; Then RegExp RegExp means
(r1r2); whereas Star RegExp means (r)*.
Please note that RegExp is a recursive polymorphic type.
Let us also consider the following declarations
a :: RegExp
b :: RegExp
c :: RegExp
a = Literal 'a'
b = Literal 'b'
c = Literal ‘c’
Then the RE re1 denoting (‘a’|(‘b’’c’)) may be represented using the above definition thus
re1 = Or a (Then b c)
The RE that specifies the language accepted by the FSM defined in chapter 17, namely
may be written as
(Star (Then c b))
(Then a (Star b)) ) ) )
(Or (Then a c) (Then c a)) )
Functions over the type of REs are defined by recursion over the structure of the expression. Examples
literals :: RegExp -> [Char]
literals Epsilon
= []
literals (Literal ch) = [ch]
literals (Or r1 r2)
= literals r1 ++ literals r2
literals (Then r1 r2) = literals r1 ++ literals r2
literals (Star r)
= literals r
which shows a list of the literals (characters) occurring in a RE. For example
literals re1  "abc"
where the result is a string showing the literals occurring in re1.
showRE :: RegExp -> [Char]
showRE Epsilon = "@"
showRE (Literal ch) = [ch]
showRE (Or x y) = "("++showRE x++"|"++showRE y++")"
showRE (Then x y) = "("++showRE x++showRE y++")"
showRE (Star x) = "("++showRE x++")*"
Com2010 - Functional programming; 2002
which shows the usual mathematical form of a RE. Note that ‘@’ is used to represent epsilon in
20.5.3 Matching REs
REs are patterns and we may ask which word w matches against each RE.
w will match the empty word if it is epsilon
w will match x if it is an arbitrary ASCII character
w will match (r1|r2) if w matches either r1 or r2 (or both).
w will match (r1r2) if w can be split into two subwords w1 and w2, w =
w1++w2, so that w1 matches r1 and w2 matches r2
w will match (r)* if w can be split into zero or more subwords, w = w1++w2++…
wn, each of which matches r. The zero case implies that the empty string will match
(r)* for any regular expression r
The words will be represented as strings over the set of all ASCII characters. The first three cases are a
simple transliteration of the definitions above, namely
matches :: RegExp->String->Bool
matches Epsilon st = (st=="")
matches (Literal ch) st = (st==[ch])
matches (Or x y) st = matches x st || matches y st
In the case of juxtaposition, we need an auxiliary function which gives the list containing all the
possible ways of splitting up a list
splits :: String->[(String,String)]
splits st = [(take n st, drop n st)| n<-[0..length st]]
Using a list comprehension we define a list of tuples with components given by applying two built-in
Haskell functions that takes and drops, respectively the first n elements of a list st.
When splits is applied to "123" it gives the following list of tuples
[("","123"),("1","23"),("12","3"), ("123","")]
A string st will match (Then r1 r2) if at least one of the splits gives strings st1 and st2 which
match r1 and r2, respectively. We thus get the next equation
matches (Then x y) st =
foldr (||) False [matches x st1 && matches y st2|
(st1,st2)<-splits st]
The built-in Haskell function foldr (do you remember this one?) is used to fold the list containing
Boolean values and iteratively apply the Boolean or (||) operator.
The final case is that of Star r. The case (a)* may be interpreted as (|(a)+) where (a)+
means a one or more times. In this way we allow the string st to be matched against  only once,
thus avoiding an infinite loop:
matches (Star r) st = matches Epsilon st ||foldr (||) False
[matches r st1 && matches (Star r) st2|(st1,st2)<-splits st]
matches (Or Epsilon (Then a (Then b c))) "abc"  True
matches (Star (Or Epsilon b)) "b" ERROR - Control stack overflow
The problem is that once discovered the empty word (the first equation) it should be removed from the
set of strings produced by further splitting the word, i.e. to avoid tuples ([],st).The next version
considers this case
Com2010 - Functional programming; 2002
matches (Star r) st = matches Epsilon st ||foldr (||) False
[matches r st1 && matches (Star r) st2|(st1,st2)<-frontSplits st]
frontSplits :: String -> [(String,String)]
frontSplits st =[(take n st,drop n st)|n<-[1..length st]]
matches (Star (Or Epsilon b)) "b" True
In this case an infinite loop has been avoided and the string “b” has been successfully matched against
the RE ( ‘b’)*.
21 Abstract Data Types
21.1 Representing Rationals
21.2 Haskell modules
Data types as we are studying them here for Haskell provide quite a powerful method of representing
real-world phenomena within a program. They are abstract in the sense that the programmer does not
need to care about how they are implemented by the compiler and how they are internally represented.
A data type supplies a set of data values that share a common structure, and therefore can be used in
similar ways. Their implementation is abstracted and hidden from the user. All that the programmer
needs to know are the generic operations for constructing and manipulating elements of the data type at
Data abstraction is a very important design principle, exploited heavily in modern programming
languages, which consists in separating the definition or representation of a data type from its use.
A typical program may have hundreds or thousands of source lines. To make it manageable we need to
split it into smaller components, called modules. A module has a name and will contain a collection of
Haskell definitions. To introduce a module called M we begin the program text in the file thus:
module M
A module may import definitions from other modules. These modules may be part of the Haskell
environment or may be written by the user. To show that module M imports some definitions from the
module IM, we write:
import IM
The module M contains the definitions of all the data types used as well as of all the operations for
constructing and manipulating elements for these data types.
21.1 Representing Rationals
Suppose we are given the task of designing a system to perform simple arithmetic operations with
rational numbers.
The natural idea to represent rationals is to use pairs of integers. Every rational number r is of the form
r = n/d with n the numerator and d the denominator.
Thus, we denote the type of rational numbers:
type Rat = (Int,Int)
This definition nicely packages up the two parts of a rational number. When we now go on to define
our arithmetic operations on the type Rat we have made sure that always implement the operation on
both the numerator and the denominator part together.
In order to implement addition and multiplication of rational numbers with respect to the usual
priority rules of these operations we write:
infixl 7 `rmult`
infixl 6 `radd`
rmult :: Rat -> Rat -> Rat
Com2010 - Functional programming; 2002
rmult (n_1,d_1) (n_2,d_2) = (n_1*n_2,d_1*d_2)
radd :: Rat -> Rat -> Rat
radd (n_1,d_1) (n_2,d_2) =
Now we can multiply and add rational numbers with one operation for each case:
rmult (5,7)(29,4)  (145,28)
radd (5,7) (29,4)  (223,28)
Note that radd and rmult have been defined as being left associative too. Consequently, for
(1,2) `radd` (2,3) `radd` (1,4)
((1,2) `radd` (2,3)) `radd` (1,4)
The expression
(1,2) `radd` (2,3) `rmult` (1,4)
(1,2) `radd` ((2,3) `rmult` (1,4))
Moreover, this expression may be equivalently written:
radd (1,2) (rmult (2,3) (1,4))
A module for rational numbers must also include an operation converting two integers into a rational
number, a function that test whether two rationals are equal, and a function getting the inverse of a
non-null rational number:
mkrat :: Int -> Int -> Rat
mkrat _ 0 = error "denominator 0"
mkrat n d = (n,d)
infix 4 `requ`
requ :: Rat -> Rat -> Bool
requ (n_1,d_1) (n_2,d_2) = (n_1*d_2==n_2*d_1)
rinv :: Rat -> Rat
rinv (0,_) = error "no inverse"
rinv (n,d) = (d,n)
Consequently a module to define a data type Rat and the operations mkrat, radd, rmult,
requ, and rinv will have the following layout:
module Rat where
type Rat …
mkrat …
This module may also contain functions to subtract and divide rational numbers.
infixl 7 `rdiv`
infixl 6 `rdiff`
rdiv :: Rat -> Rat -> Rat
rdiv x y = x `rmult` (rinv y)
rdiff :: Rat -> Rat -> Rat
rdiff x y=x `radd` (mkrat (-1) 1) `rmult` y
1. rmult and rdiv on the one hand and radd and rdiff on the other hand have the same
priority level
Com2010 - Functional programming; 2002
all the operations radd, rdiff, rmult, rdiv are left associative
rdiv and rdiff are defined without referring to the specific representation of the type
(mkrat 1 2) `rdiff` (mkrat 1 4)  (2,8)
(1,2) `rdiv` (1,2)  (2,2)
Our encoding of rational numbers is not an exact representation. There are two deficiencies:
contains improper elements; the pairs (n,0) do not correspond to any rational number, but
some operations do not care about them (radd, rmult,requ) !
2. the representation is redundant; one and the same number, say 1/3 has infinitely many
representations, i.e. all the pairs (n, 3*n) !
To remove redundant representatives a function reduce may be used:
reduce :: Rat -> Rat
reduce (_,0) = error "denominator 0"
reduce (x,y) = (x `div` d, y `div` d)
where d= gcd x y
gcd is a built-in function that computes the greatest common divisor for two integers.
If reduce is applied to all the operations involving rational numbers a unique representation for them
is then obtained.
Once defined, the operations on rational numbers, provided by the module Rat, may be imported by
other modules in order to make use of them. If for example we consider a module Application that
compute linear combinations, then we may write
module Application
import Rat
-- Haskell definition for functions
-- providing linear combination
Given k integer numbers n_1, … n_k and k rational numbers r_1, … r_k , the following sum
n_1 * r_1 +… n_k * r_k
is called a linear combination of the rational numbers.
The following function computes a linear combination taking as input a list of integer and rational
numbers and returning a rational number:
linComb :: [(Int,Rat)] -> Rat
linComb = foldl raddIntRat (mkrat 0 1)
where raddIntRat, given below, adds a rational and the product of an integer with a rational
raddIntRat :: Rat -> (Int,Rat) -> Rat
raddIntRat x (n,y) =radd x (rmult (mkrat n 1) y)
For example, the following linear combination, 1*1/2 + 5*3/5 + 5*3/5 + 1*1/2 = 9, may be computed
[(1,(1,2)),(5,(3,5)),(3,(5,3)),(1,(1,2))]  (9,1)
Com2010 - Functional programming; 2002
When module Application imports Rat, all the definitions made in this module are visible and
usable in Application. On the other hand the details of all the data types defined in Rat may be
used in Application. For example a rational number may equally be used as a Rat element as well
as a pair (Int,Int). The last form depends obviously on implementation which gives a
representation for Rat.
How can we tackle these problems? The solution is to treat Rat as an abstract data type and to hide the
rest of the details.
21.2 Haskell modules
The Haskell module system allows definitions of data types and functions to be visible or hidden when
a module is imported in another.
A module layout is split down into two parts:
 a visible part that is exported and which gives all the definitions that may be used outside of the
 a hidden part that implements the types and the functions exported plus some other objects which
are not visible
In this way we may be hidenot only the algorithm implementing various operations but also the details
of implementing (representing) various data types.
For example in the case of Rat we may decide to export from it the data type Rat and the operations
radd, rdiff, rmult, rdiv, requ and mkrat. In this case the module header is
module Rat
-- Rat -> Rat -> Rat
-- Rat -> Rat -> Rat
-- Rat -> Rat -> Rat
-- Rat -> Rat -> Rat
-- Rat -> Rat -> Bool
-- Int -> Int -> Rat
) where
The module Rat provides a limited interface to the type Rat by means of a specified set of operations.
The data type Rat is called an Abstract Data Type.
Please also note that the functions rinv, reduce have not been specified and consequently can not
be used outside of Rat.
If we try to use now rinv in the module Application then the error message
ERROR - Undefined variable "rinv"
will be issued.
Using abstract data types any application may be split down into a visible part called also signature or
interface and a hidden part named also implementation.
We can modify the implementation without having any effect on the user.
For example the data type Rat may be represented as an algebraic type
data Rat = ConR Int Int
or as a real type
type Rat = Float
In both cases the interface will be kept the same and only the implementation part will be changed. The
module Application will remain also unchanged.
If we use for Rat the implementation based on algebraic data types and in the module
Application the function linComb is modified such as to refer to the constant ConR 0 1
corresponding to the rational number 0/1
linComb :: [(Int,Rat)] -> Rat
linComb = foldl raddIntRat (ConR 0 1)
then an error message is issued
Com2010 - Functional programming; 2002
Undefined constructor function "ConR"
saying that the details of defining the data type Rat are no longer available in Application.
-- implementation part of the module Rat when data type
-- Rat is Float
type Rat = Float
infixl 7 `rmult`
infixl 6 `radd`
infix 4 `requ`
rmult :: Rat -> Rat -> Rat
rmult x y = x*y
radd :: Rat -> Rat -> Rat
radd x y = x+y
requ :: Rat -> Rat -> Bool
requ x y = (x==y)
mkrat :: Int -> Int -> Rat
mkrat _ 0 = error "denominator 0"
mkrat n d = fromInt n / fromInt d
rinv :: Rat -> Rat
rinv 0.0 = error "no inverse"
rinv x = 1.0 / x
showrat :: Rat -> String
showrat x = show x
Com2010 - Functional programming; 2002