Lazy Lists

NOTE: Updated 5/11 and 5/16.

A common data structure that incorporates laziness is a lazy list (a.k.a. stream). Having worked through laziness in Elm in detail using the previous examples, our discussion of streams here will be brief, mainly focusing on picking the right representation.

First Attempt — NotSoLazyList.elm

One possibility for representing LazyLists is the following type.

type LazyList a
  = Nil
  | Cons (Lazy a) (LazyList a)

This datatype describes lists that are not very lazy, however. We can define a function range : Int -> Int -> LazyList Int and demonstrate how a LazyList of n elements immediately builds n Cons cells.

> range 1 10
Cons (Lazy <function>)
 (Cons (Lazy <function>)
  (Cons (Lazy <function>)
   (Cons (Lazy <function>)
    (Cons (Lazy <function>)
     (Cons (Lazy <function>)
      (Cons (Lazy <function>)
       (Cons (Lazy <function>)
        (Cons (Lazy <function>)
         (Cons (Lazy <function>) Nil)))))))))
    : NotSoLazyList.LazyList Int

Second Attempt — PrettyLazyList.elm

Another option is the following.

type LazyList a
  = Nil
  | Cons a (Lazy (LazyList a))

This is pretty good, but notice that a non-Nil list must have its first value evaluated. Consider what the representation of a range of Ints looks like.

> range 1 10
Cons 1 (Lazy <function>) : PrettyLazyList.LazyList Int

Final Attempt — LazyList.elm

What we really want is for all elements in the list, including the first, to be delayed until needed. We can achieve this as follows.

type alias LazyList a = Lazy (LazyListCell a)

type LazyListCell a
  = Nil
  | Cons a (LazyList a)

Thought Exercise: Why didn’t we use a similar strategy in defining the the lazy Nats before?

range

The range function is incremental. Notice the trivial suspension lazy (\_ -> Nil).

range : Int -> Int -> LazyList Int
range i j =
  if i > j
    then lazy (\_ -> Nil)
    else lazy (\_ -> Cons i (range (i+1) j))

The comparison i > j isn’t expensive, so we decided to evaluate it right away rather than delaying it by putting it inside the LazyList.

We can also define a “debug” version to emphasize when list items get forced to evaluate:

range_ : Int -> Int -> LazyList Int
range_ i j =
  if i > j then lazy (\_ -> Nil)
  else lazy <| \_ ->
    let _ = Debug.log "force" i in
    Cons i (range_ (i+1) j)

toList

Converting a stream to a List is monolithic:

toList : LazyList a -> List a
toList xs =
  let foo acc xs = case force xs of
    Nil        -> acc
    Cons x xs_ -> foo (x::acc) xs_
  in
  List.reverse <| foo [] xs

Now we can force the incremental range function to do its work:

> range_ 1 5 |> toList
force: 1
force: 2
force: 3
force: 4
force: 5
[1,2,3,4,5]
    : List Int

infinite

We can also describe infinite streams.

infinite : Int -> LazyList Int
infinite i = lazy (\_ -> Cons i (infinite (i+1)))

Let’s define a debug version again:

infinite_ : Int -> LazyList Int
infinite_ i = lazy <| \_ ->
  let _ = Debug.log "force" i in
  Cons i (infinite_ (i+1))

Not surprisingly, we don’t have enough memory to represent all positive integers:

> infinite_ 1 |> toList
FATAL ERROR: JS Allocation failed - process out of memory

take

The take function is incremental.

take : Int -> LazyList a -> LazyList a
take k xs =
  case (k, force xs) of
    (0, _)         -> lazy (\_ -> Nil)
    (_, Nil)       -> lazy (\_ -> Nil)
    (_, Cons x xs) -> lazy (\_ -> Cons x (take (k-1) xs))

Incremental function in action:

> infinite 1
Lazy <function> : Lazy.Lazy (LazyList.LazyListCell Int)

> infinite 1 |> take 10
Lazy <function> : Lazy.Lazy (LazyList.LazyListCell Int)

> infinite 1 |> take 10 |> toList
[1,2,3,4,5,6,7,8,9,10] : List Int

But there is still some unnecessary work; take forces the input list even if no elements are taken:

> infinite_ 1 |> take 0 |> toList
force: 1
[]
    : List Int

A slightly lazier version of take:

take k xs =
  if k <= 0 then lazy (\_ -> Nil)
  else
    case force xs of
      Nil        -> lazy (\_ -> Nil)
      Cons x xs_ -> lazy (\_ -> Cons x (take (k-1) xs_))

This no longer forces the list when zero elements are taken…

> infinite_ 1 |> take 0 |> toList
[] : List Int

> infinite_ 1 |> take 5 |> toList
force: 1
force: 2
force: 3
force: 4
force: 5
[1,2,3,4,5]
    : List Int

… but it does force the list even when the first element is really needed:

> infinite_ 1 |> take 5
force: 1
Lazy <function>
    : LazyList.LazyList Int

Lazier:

take k xs =
  if k <= 0 then lazy (\_ -> Nil)
  else
    lazy <| \_ ->
      case force xs of
        Nil        -> Nil
        Cons x xs_ -> Cons x (take (k-1) xs_)

That’s better:

> infinite_ 1 |> take 5
Lazy <function> : LazyList.LazyList Int

drop

The drop function is also incremental.

drop : Int -> LazyList a -> LazyList a
drop k xs =
  if k <= 0 then xs
  else
    lazy <| \_ ->
      case force xs of
        Nil        -> Nil
        Cons _ xs_ -> force (drop (k-1) xs_)

For example:

> infinite 1 |> drop 10 |> take 10 |> toList
[11,12,13,14,15,16,17,18,19,20] : List Int

append

Combining two streams using append is incremental.

append : LazyList a -> LazyList a -> LazyList a
append xs ys =
  lazy <| \_ ->
    case force xs of
      Nil        -> force ys
      Cons x xs_ -> Cons x (append xs_ ys)

reverse

Reversing a stream delays forcing the input list…

reverse : LazyList a -> LazyList a
reverse xs =
  lazy <| \_ ->
    case force xs of
      Nil        -> Nil
      Cons x xs_ -> force (append (reverse xs_) (singleton x))

nil         = lazy (\_ -> Nil)
singleton x = lazy (\_ -> Cons x nil)

… but once it is forced, the recursion is monolithic:

> reverse (range_ 1 5) |> toList
force: 1
force: 2
force: 3
force: 4
force: 5
[5,4,3,2,1]
    : List Int

> eq (range 1 1) (range 1 10000)
False : Bool

> eq (range 1 1) (reverse (range 1 10000))
RangeError: Maximum call stack size exceeded

So, we should make it tail-recursive: (NOTE 5/16: Updated the Cons case below.)

reverse : LazyList a -> LazyList a
reverse xs =
  let foo acc xs =
    case force xs of
      Nil        -> acc
      Cons x xs_ -> lazy (\_ -> force (foo (lazy (\_ -> Cons x acc)) xs_))
   -- Cons x xs_ -> foo (lazy (\_ -> Cons x acc)) xs_
  in
  lazy (\_ -> force (foo nil xs))

Notice that lazy (\_ -> Cons x acc) above is another example of a trivial thunk. The values x and acc have already been evaluated, so building the Cons value does not force any additional computations.

<Loose End> (Added 5/16)

Hmm, even though this version does not make the recursive call to the helper function foo right away, it still busts the stack…

> eq (range 1 1) (reverse (range 1 10000))
RangeError: Maximum call stack size exceeded

What if we write a tail-recursive function that does not attempt to delay any of the (non-trivial) computation?

reverse2 : LazyList a -> LazyList a
reverse2 xs =
  let foo acc xs =
    case force xs of
      Nil        -> acc
      Cons x xs_ -> foo (lazy (\_ -> Cons x acc)) xs_
  in
  foo nil xs

This works okay here…

> eq (range 1 1) (reverse2 (range 1 10000))
False : Bool

… but there are new issues:

> range 1 5 |> reverse2 |> toList
FATAL ERROR: JS Allocation failed - process out of memory

> range 1 5 |> reverse2 |> take 2 |> toList
[<internal structure>,<internal structure>] : List Int

Out of memory for such a small list? And “internal structure” values? If we swap out the use of Lazy with hand-rolled thunks instead…

-- import Lazy exposing (Lazy, lazy, force)

type Lazy a = Lazy (() -> a)
force (Lazy f) = f ()
lazy = Lazy

… we get the same last two behaviors above. So, the issue does not seem to stem from the Lazy library.

I’m not sure… let’s live with the version above that busts the stack.

eq

Our final monolithic example function checks for equality, forcing only as many elements as needed when the lists are not equal.

eq : LazyList a -> LazyList a -> Bool
eq xs ys =
  case (force xs, force ys) of
    (Nil, Nil)               -> True
    (Cons x xs_, Cons y ys_) -> x == y && eq xs_ ys_
    _                        -> False

Can break out early, but busts the stack:

> eq (range 0 1000) (range 0 1000)
True : Bool

> eq (range 0 1000) (range 0 10000)
False : Bool

> eq (range 0 10000) (range 0 10000)
RangeError: Maximum call stack size exceeded

Even though (&&) has short-circuiting semantics, this syntactic expression eludes the compiler’s support for tail call elimination. So let’s use a conditional instead:

    ...
    (Cons x xs_, Cons y ys_) -> if x /= y then False else eq xs_ ys_
    ...

That’s better:

> eq (range 0 10000) (range 0 10000)
True : Bool

> eq (range 1 10) (range 1 10000000)
False : Bool

> eq (range 1 10) (range 1 1000000000000000)
False : Bool

> eq (range 1 10) (infinite 1)
False : Bool


Reading

Required

  • Okasaki, Chapter 4.2