Consider a simple recursive function that sum
s the first n
positive integers (TailRecursion.elm
):
sum n =
if n <= 0
then 0
else n + sum (n-1)
For example:
> sum 100
5050 : number
> sum 1000
500500 : number
> sum 10000
50005000 : number
> sum 100000
RangeError: Maximum call stack size exceeded
100000
isn’t very large, and yet we still ran out of “call stack” space.
The number 100000
itself isn’t the problem…
> 1000000
1000000 : number
nor is a list with 1000000
(or more) numbers…
> List.range 1 100000
[ ... ] : List Int
> List.range 1 1000000
[ ... ] : List Int
> List.range 1 10000000
[ ... ] : List Int
> List.range 1 100000000
FATAL ERROR: JS Allocation failed - process out of memory
Okay, well eventually it is. But it is reasonable that at some point (between the last two sample interactions) that the list of numbers may require more memory (also known as “heap” space, though unrelated to the heap data structures we have studied) than we can, or are allowed to, use.
But the call stack error is something very different than this heap error. The call stack is used to track, well, the stack of currently executing functions. The call stack grows and shrinks as the program executes. The relative size of the memory dedicated to the call stack is, in general, much less than that dedicated to the heap. This makes sense because the code for a computation is typically far smaller than the data it operates on.
So a call stack error means that we have too many “outstanding” function calls, which are waiting for their callee functions to return before continuing. Recursively defined functions, naturally, pile up a lot of “stack frames,” so running out of stack space is a serious concern, especially in functional languages that encourage (or force, in the case of purely functional languages) programmers to use recursion heavily.
The way that languages (especially functional ones) deal with this issue is to make the following pact: if the programmer writes a recursive function that is tail recursive, then the language compiler promises to evaluate the function in constant stack space (rather than linear in the number of recursive calls).
A function f
is tail recursive if all recursve calls to f
(if any) appear in tail position, meaning the last thing that the current function invocation does is simply to return the result of the recursive invocation.
Many recursive functions, but not all, can be re-written to be tail recursive by defining a helper function to take an extra argument that will store the “accumulated” or “running” result, to be updated in each recursive call. The value of this extra argument is then returned as the final result when no more recursive calls are required.
For example, consider the following tail recursive helper function to sum the first n
positive numbers.
sum_tr_ : Int -> Int -> Int
sum_tr_ acc n =
if n <= 0
then acc
else sum_tr_ (n+acc) (n-1)
The last thing to do is implement a function that kicks off the recursion by supplying the initial value of the accumulating result.
sum_tr = sum_tr_ 0
Languages that enter into such pacts do so because a tail recursive function can be translated into a loop in a target language with imperative features.
For example, assuming a target language that contains support for mutable variables or references (created by var
), dereferencing variables (denoted by the !
operator), updating mutable variables (denoted by the :=
operator), and loops, the sum_tr_
function above might be translated to something that resembles the following.
sum_tr_ acc n =
var i := n;
var res := acc;
while (not (!i <= 0)) {
res := !i + !res;
i := !i - 1;
}
!res
Notice that the mutable variables i
and res
are initialized with the (immutable) values n
and acc
, respectively, and the while
loop iteratively updates these variables as long as the current value of i
is non-negative. Executing a loop does not add any frames to the call stack, so the translated version does not suffer the possibility of running out of stack space to keep track of outstanding recursive calls.
Tail call elimination is crucial to most functional programming languages.
> sum_tr 1000000
500000500000 : Int
> sum_tr 10000000
50000005000000 : Int
> sum_tr 100000000
5000000050000000 : Int
The following is a canonical definition for defining folding from the left in many functional programming languages.
foldl : (a -> b -> b) -> b -> List a -> b
foldl f acc xs = case xs of
[] -> acc
x::xs_ -> foldl f (f x acc) xs_
Because foldl
is tail recursive, languages that perform tail call elimination would rewrite it to something that resembles the following.
foldl f acc xs =
var ys := xs;
var res := acc;
while (not (!ys = [])) {
let (x::xs_) = !ys in
res := f x !res;
ys := xs_;
}
!res
For example:
> foldl (+) 0 (List.range 1 1000)
500500 : Int
> foldl (+) 0 (List.range 1 10000)
50005000 : Int
> foldl (+) 0 (List.range 1 100000)
5000050000 : Int
> foldl (+) 0 (List.range 1 1000000)
500000500000 : Int
> foldl (+) 0 (List.range 1 10000000)
50000005000000 : Int
So, is the library function List.foldl
a tail recursive Elm function?
> List.foldl (+) 0 (List.range 1 10000000)
50000050000000 : Int
Check out the implementation in List.elm
.
The following canonical definition of the foldr
function is not tail recursive.
foldr : (a -> b -> b) -> b -> List a -> b
foldr f acc xs = case xs of
[] -> acc
x::xs_ -> f x (foldr f acc xs_)
For example:
> foldr (+) 0 (List.range 1 100)
5050 : number
> foldr (+) 0 (List.range 1 1000)
500500 : number
> foldr (+) 0 (List.range 1 10000)
RangeError: Maximum call stack size exceeded
Wait, but:
> List.foldr (+) 0 (List.range 1 10000)
50005000 : Int
It turns out that List.foldr
is implemented (in List.elm
) as a native JavaScript function (defined in List.js
) that, not surprisingly, uses a loop!
Optional Exercise — Write a version of foldr
(in Elm) such that any recursion within runs in constant stack space by making use of foldl
.