Fold Versus Recursion
  |   Source

Folds and recursion

I recently read a paper by Graham Hutton on folds and recursion. If you don't care much about functional programming, feel free to disengage now. 😆

It is an interesting paper, and I encourage you to give it a read. This article is more squirrel chasing and less about the central thesis of Graham Hutton's work.

Spoiler alert, Haskell doesn't have for, while, do loops! That leaves tools like map, recursions, and fold in our toolbox. Recursion can be very expensive, so this paper shows us how to abstract it away with fold. One of the examples rewrites dropWhile using fold instead of recursion.

dropWhile' p = foldr f v
  where f x (ys, xs) = (if p x then ys else x:xs,  x:xs)
        v = ([ ],[ ])
dropWhile p = fst . dropWhile' p

It took me a while to unpack what that chunk of code is doing.

  • Iterate over the list, in reverse
  • Store visited values in xs
  • When the predicate stops firing, flip xs into ys which has been empty up to this point
  • The last line returns a function that takes a list of values and returns another (filtered) list

Clearly, the list must be finite for this to work.

dropWhile in Python

Let's mess around with a translation and a couple variants. These are quick-and-dirty examples, missing guardrails.

In [14]:
from functools import reduce

def drop_while(pred, items):
    def _drop_while(tup, x):
        ys, xs = tup
        if pred(x):
            return (ys, [x] + xs)
        return ([x] + xs, [x] + xs)
    return reduce(_drop_while, reversed(items), ([], []))[0]

%timeit drop_while(lambda x: x < 42, range(1000))
5.21 ms ± 37.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [15]:
def drop_it_hot(pred, items):
    for i, v in enumerate(items):
        if pred(v):
            continue
        return items[i:]
    return items

%timeit drop_it_hot(lambda x: x < 42, range(1000))
6.43 µs ± 9.13 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [16]:
def drop_it_hotter(pred, items):
    i = iter(items)
    v = next(i)
    while pred(v):
        v = next(i)
    return i

%timeit drop_it_hotter(lambda x: x < 42, range(1000))
7.37 µs ± 33.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Performance

Not bad! If you asked me ahead of time how those would perform, I would have guessed the exact opposite order that happened. The functional version has to traverse the entire list, but the other two short circuit when the predicate stops firing. If you know why this is the case, please leave a comment! (Buttons at the bottom of the page)

Fibonacci numbers

All this made me think about the Fibonacci function (everyone's pet recursion function). Let's start with the typical recursive implementation then see if we can rewrite it using ~fold~ reduce.

In [17]:
def fibr(n):
    if n in [1, 2]:
        return 1
    return fibr(n -1) + fibr(n - 2)

%timeit fibr(10)
15.4 µs ± 95.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [19]:
from functools import reduce

def fib(n):
    # `range` is simply a generator, and we ignore its value
    return reduce(lambda x, _: (x[1], x[0] + x[1]), range(n - 1), (1, 1))[0]

%timeit fib(100)
18.6 µs ± 141 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [41]:
def fibfor(n):
    # Boring, imperative version
    a, b = 1, 1
    if n in [1, 2]:
        return 1
    for i in range(2, n+1):
        # Am I the only one that finds this cringey?!
        a, b = b, a + b
    return a

%timeit fibfor(100)
5.82 µs ± 84.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Recursion is expensive

Yikes! That recursive implementation is an order of magnitude worse than the other two versions. I was again surprised that the imperative version kicked reduce's butt.

TL;DR; When performance matters, measure!

I have more mathy work coming soon. I wanted to slip this out between bigger projects, more to come soon!

IHaskell plug

Want to play with the Haskell code in the intro, but in the comfort of a Jupyter notebook? Check out IHaskell!