A Fibonacci A Day: Using Streams and a Fast Memoization Method

Well, I didn’t think I’d write another Fibonacci A Day post — but there’s a great fibonacci function that’s defined right in the Scala documents, in their discussion of Streams.

stream is a standard Scala data structure which is very list like. Recall that a Scala list has two parts, a head and a tail; the head is a value, and the tail is either the special empty list (Nil), or another list. For example, the list created by List(“cat”, “mouse”, “cheese”) is a list whose head is “cat”, and whose tail is a list whose head is “mouse” and whose tail is a list whose head is “cheese” and whose tail is Nil (the cheese does not quite stand alone).

Elements in a stream (the values in their heads), on the other hand, are only evaluated when they are needed. For example, getting the nth value is a stream causes the steam to evaluate its first element, its second element, etc, up to n. Once evaluated, they are remembered (that is, memoized) so they the next time the element is requested, the value has already been computed (the lookup is still O(n), but no computation is done).

Here’s a definition of a fibonacci stream, pretty much as given in the documents

lazy val _fibs: Stream[BigInt] = 
  0 #:: 1 #:: (_fibs zip _fibs.tail map { case (x, y) ⇒ x + y })

This defines an “infinite” list of fibonacci numbers — infinite in the sense that the list can be as long as we have memory for. Note that Scala allows us to have recursive data structures as well as recursive functions. Here the tail of the _fibs stream (after the 2nd element, anyway) is defined in turns of itself — we zip the stream with the tail of the stream, and then each element is the sum of the two previous elements — just as we had defined.

We can access this list in a similar way to our previous functions to get positive and negative fibonacci numbers:

def stream_fibonacci(n: Int) = n match {
    case _ if (n >= 0)     ⇒ _fibs(n) // natural numbers 
    case _ if (n % 2 == 0) ⇒ -_fibs(-n) // even, negative numbers*/
    case _                 ⇒ _fibs(-n) // odd, negative numbers 
  }

How does compare in speed with our previous functions? Calling stream_fibonacci(1000) the first time results in zipping though the stream and calculating the sums. Somewhat surprisingly, this is not more expensive than our fastest version. Afterwards, though, we just “look up” the 1001st element, which requires traversing the stream’s first 1001 elements. But this is very fast, and is about 100x times faster on my machine (see the updated benchmarks).

Of course, this comes at the expense of memory storage for the 1001 fibonacci numbers in the stream. Depending on the application, this could be just fine. Of course, fibonacci numbers get big fast, so maintaining a large stream of them might not be good (the other versions only require two or so BigInts to be held in memory during the computation).

A Fibonacci A Day: Fibonacci(n) in O(lg(n)) steps

I am taking Martin Odensky’s Functional Programming in Scala class, and discovered (or rediscovered) that the Fibonacci function is discussed in SICP (Structure and Interpretation of Computer Programs, the book that Odensky seems to be basing much of his Functional Programming in Scala course on).

Even more interesting is that a Fibonacci function which runs in O(lg(n)) steps is presented — that is, it takes about lg(n) steps to compute the nth Fibonacci number, instead of n steps. It’s given as an exercise (Exercise 1.19). Bill the Lizard has been working through the SICP exercises, and his explanation of Exercise 1.19 is the clearest I’ve found. The code below might not make much sense without looking at the SICP chapter or his explanation, but here is the code in Scala, and put in the form that I’ve been using — this computes fibonacci(n) for any integer (including negative integers). Running the benchmark indicates that this version a lot faster than the straight tail recursive version.

  def fibonacci(n: Int): BigInt = {
    val limit = if (n < 0) -n else n
    /* the fibonacci function per se — on natural numbers */
    def fib(m: Int): BigInt = {
      @tailrec def recurse(a: BigInt, b: BigInt, p: BigInt, q: BigInt, count: Int): BigInt =
        if (count == 0) b
        else if (count % 2 == 0) {
          val qq = q * q
          recurse(a, b, (p * p + qq), (2 * p * q) + qq, count / 2)
        } else {
          val aq = a * q
          recurse(b * q + aq + a * p, b * p + aq, p, q, count - 1)
        }
      (m: @switch) match {
        case 0 ⇒ BigInt(0)
        case 1 ⇒ BigInt(1)
        case _ ⇒ recurse(1, 0, 0, 1, m)
      }
    }
    n match {
      case _ if (n >= 0)     ⇒ fib(n) // natural numbers 
      case _ if (n % 2 == 0) ⇒ -fib(-n) // even, negative numbers*/
      case _                 ⇒ fib(-n) // odd, negative numbers
    }
  }

 

A Fibonacci A Day

The goal of this series was to come up with a general, efficient fibonacci function that goes beyond the typical computer science introduction. The version we developed is fast, efficient with respect to memory, and general. I’d be glad to hear comments.

3200758169_e2701d1e17

The Series:

  1. A Fibonacci A Day — Introduction
  2. A Fibonacci A Day — Recursive Version
  3. A Fibonacci A Day — Special Pi Day Post
  4. A Fibonacci A Day — Iterative Version
  5. A Fibonacci A Day — Out of Bounds
  6. A Fibonacci A Day — Speedy Recursion
  7. A Fibonacci A Day — Golden Ration Version
  8. A Fibonacci A Day — Memoization
  9. A Fibonacci A Day — Benchmarking
  10. A Fibonacci A Day — Extending to Negative Numbers
  11. A Fibonacci A Day — Fibonacci in O(lg(n)) Steps
  12. A Fibonacci A Day — Using Streams and a Fast Memoization Method

 

A Fibonacci A Day — Extending to Negative Numbers

6040055534_e26371bc09We are now confident that the fast recursive version is the best of the algorithms we have explored — it is fast, and can handle large values of the Fibonacci function.

I haven’t been happy, though, with returning a negative -1 when the function is called with a negative number — this feels like a kind of silent error we were trying to avoid in going to BigInts. Fortunately, it turns out that there is a natural extension of the Fibonacci sequence to negative numbers. If the nth Fibonacci number is defined as F(n) = F(n-2) + F(n-1), then we can rearrange this to F(-n) = (-1)nF(n).

So, this is the final version of the Fibonacci function, that supports any integer, up to the the memory size of the computer, computed quite quickly. Much better than our simple recursive or iterative version, which were ripe for error and inefficiency.

 def fibonacci(n: Int): BigInt = {
    val limit = if (n < 0) -n else n
    /* the fibonacci function per se — on natural numbers */
    def fib(m: Int): BigInt = {
      @tailrec def recurse(i: Int, u: BigInt, v: BigInt): BigInt =
        if (i == limit) v
        else recurse(i + 1, v, u + v)
      (m: @switch) match {
        case 0 ⇒ BigInt(0)
        case 1 ⇒ BigInt(1)
        case _ ⇒ recurse(1, 0, 1)
      }
    }
    n match {
      case _ if (n >= 0)     ⇒ fib(n) // natural numbers 
      case _ if (n % 2 == 0) ⇒ -fib(-n) // even, negative numbers*/
      case _                 ⇒ fib(-n) // odd, negative numbers
    }
  }

I hope you’ve enjoyed these series.

 

A Fibonacci A Day — Benchmarking

Race
The following table describes the limits of the various Fibonacci algorithms I’ve created during this series. I ran them through a very simple benchmarking routine, and empirically tested them for the maximum size they can handle. Each benchmark was run 1000 times, except the first algorithm, which was run just once — we already know it runs too slow. These were run on a MacBook Air (Late 2010).

Log Recursive (BigInt)0.0082Unlimited (except for memory)

Algorithm Milliseconds per call for F(46) Max size
Simple Recursive 21,360.00 40 or so
Classic Iterative (Int) 0.0040 46
Classic Iterative (Long) 0.0040 164
Classic Iterative (Double) 0.0030 76
Classic Iterative (BigInt) 0.0080 unlimited (except memory)
Tail Recursive (BigInt) 0.0060 unlimited (except memory)
Using Phi (BigInt) 0.0030 70 or so
Memoization (BigInt) 0.0020 unpredictable
Memoization Monad (BigInt) 0.0940 unpredictable

Here is a table comparing the versions which allow really big fibonacci numbers, calculating F(20000) 100 times each.

Algorithm Milliseconds per call for F(20000) Max size
Classic Iterative (BigInt) 24.23 unlimited (except memory)
Tail Recursive (BigInt) 23.50 unlimited (except memory)
Log Recursive (BigInt) 1.00 unlimited (except memory)
Stream (BigInt), first time 3.19 unlimited (except memory)
Stream (BigInt), subsequent calls 0.22 unlimited (except memory)

F(20000) is a really big integer, and a lot of this time is creating and destoying the memory associated with creating the integers. The thing to notice is that the iterative and the tail recursive versions take approximately the same amount of time — the recursive version actually runs a little more quickly than the iterative version. And the Log recursive version of the same is two orders of magnitude faster — — the stream version is yet another order faster, but it may use too much memory. Note that even the first use is faster than the iterative and tail recursive versions!

Still, there’s really no reason not to use the BigInt, log recursive version of this function — it’s both fast and accurate.

(April 2, 2013: Updated to include log recursive version.

(April 13, 2013: Updated to include < a href="http://willwhim.wpengine.com/2013/04/13/a-fibonacci-a-day-using-streams-and-a-fast-memoization-method/">stream version; rerun benchmarks for F(20000); and update conclusions.

A Fibonacci A Day — Memoization

SONY DSC

Although the iterative recursive version is stateless, it looks different from the standard mathematical definition. The recursive version is a very close mapping, but it’s extremely inefficient. One way to overcome the inefficiencies is to dynamically record (or memoize) previous calculations of F(n). The idea is simple — if we already know the values of F(n – 1) and F(n – 2), we can simply add them without having to do the difficult steps in recreating them.

Here is a version that does just that —

  val memo = scala.collection.mutable.HashMap[Int, BigInt]()
  def fibonacci_8(n: Int): BigInt = {
    def fib(i: Int): BigInt =
      memo.getOrElseUpdate(i, fibonacci_8(i))
    n match {
      case 0 ⇒ BigInt(0)
      case 1 ⇒ BigInt(1)
      case m ⇒ fib(m - 1) + fib(m - 2)
    }
  }

Note that a memoization table is defined outside the scope of the function, and an internal function, fib looks for a memoized value in the table. If it finds it, it simply returns it; if it doesn’t, it makes a recursive call to the outer fibonacci function. Consider F(2). It looks up F(1), which is not yet in the table, so it calls the outer fibonacci function with 1. That returns BigInt(1), which is stored in the table and returned. It does a similar thing with F(0). It now knows enough to calculate the results, which it returns.

The main body of this function looks very similar to the mathematical definition. But there are two disadvantages to this version. One is that we still have mutable state (the memoization table). A clever, if somewhat convoluted way, to overcome this is to use a State monad —Tony Morris describes how to do this in his article,  Memoization WIth State Using Scala. Another problem is this is somewhat less predictable — calling F(2000) out of the gate is likely to fail (with too many stack calls) unless most of the intermediate values are already calculated. That is, fibonacci_8(2000) will probably fail, but for (i ← 1 to 2000) fibonacci_8(n) might not. I have more problems with the second issue than the second —

Side note: English already had a word for remembering things by rote, memorization, and a technical term, cacheing as well. Why did memoization seem necessary?

A Fibonacci A Day — Golden Ratio version

6202116053_165d2c6414

The Fibonacci sequence has a very strong connection to the Golden Ratio (φ, or Phi). φ is defined as (√5 + 1)/2, or approximately 1.618, and has lots of special properties. Ron Knott has a web page with over 300 Fibonacci and φ formulas. One of these is this direct formula: F(n) = round(φn/√5), where the rounding function does rounding to the nearest whole number.

This gives us a quite an efficient function:

def fibonacci_7(n: Int) : Long = {
    val sqrt5 = math.sqrt(5.0)
    val φ = (sqrt5 + 1) / 2.0
    math.round(math.pow(φ, n) / sqrt5)
  }

Because the math.round function produces a Long value, we can’t use BigInts as before. In fact, due to rounding error (in the calculations of φ,√5, and φn, this version is only exact to F(70); after that, there are low-end digits wrong (and, as we saw before, can only handle values up to F(176) even with inaccuracies.

 

A Fibonacci A Day — Speedy Recursion

8282670491_d77b46ea99

We have seen a very inefficient recursive version of the Fibonacci function, and an efficient iterative version. Is there an efficient recursive version?

Yes, in fact.  Recall that the definition of the Fibonacci function, F(n) is:

F(0) = 0
F(1) = 1
F(n) = F(n-1) + F(n-2)

What’s important to note is that the recursive step of the Fibonacci formula has two values that need to be accumulated. In the iterative version, the two temporary variables u and v are used to accumulate these values. Consider this recursive version instead:

def fibonacci_6(n: Int): BigInt = {
    def recurse(i: Int, u: BigInt, v: BigInt): BigInt =
      if (i == n) v
      else recurse(i + 1, v, u + v)
    n match {
      case m if m < 0 ⇒ BigInt(-1)
      case 0          ⇒ BigInt(0)
      case 1          ⇒ BigInt(1)
      case m          ⇒ recurse(1, 0, 1)
    }
  }

As before, we handle the negative, 0, and 1 cases. But the the other cases are handled by the recurse subfunction. We pass a counter (i), and the two accumulators (u and v). This function should be recognized as a tail recursive function, and thus it should be just as efficient as the iterative version. And, note, whereas before we have three mutable variables; now we had none.

A Fibonacci A Day — Out of Bounds

Yesterday’s iterative version of the Fibonacci function is pretty speedy, especially compared to the recursive version. It calculates F(1000) in almost no time at all —

scala> val t0 = System.currentTimeMillis; val f = fibonacci_2(1000); val dur = System.currentTimeMillis - t0
t0: Long = 1363357913796
f: Int = 1556111435
dur: Long = 1

But. The result for F(1000) looks suspiciously low — and odd things happen if we use a different ‘high’ number, such as 100:

scala> fibonacci_2(100)
res22: Int = -980107325

As it turns out, Scala’s Int and Long types are always signed, and do silent overflow. Once the number gets to a certain size, adding more will turn it ‘negative,’ and adding even more will eventually turn it positive, but (of course) the number is not correct. Nothing in the Scala type system prevents this. The Int version of the fibonacci function is good up to 46, the Long version is good up to about 164, and a Double version up to about 1476.

Fortunately, Scala provides a BigInt that will grow as big as memory available, so we can simply rewrite the iterative version to use BigInts.

The other, minor problem, is that we don’t handle the case of passing negative numbers to our fibonacci function — this is not defined (at least as we described it informally). We can either ‘throw an error,’ or return a negative number to indicate the number is out of domain.

Here’s the revised, BigInt version.

def fibonacci_5(n: Int): BigInt = n match {
    case m if m < 0 ⇒ BigInt(-1)
    case 0          ⇒ BigInt(0)
    case 1          ⇒ BigInt(1)
    case m ⇒ {
      var u = BigInt(0)
      var v = BigInt(1)
      var t = BigInt(1)
      for (i ← 1 to m - 1) {
        t = u + v
        u = v
        v = t
      }
      t
    }
  }

And here is F(1000): 43,466,557,686,937,456,435,688,527,675,040,625,802,564,660,517,371,780,402,481,729,089,536,555,417,949,051,890,403,879,840,079,255,169,295,922,593,080,322,634,775,209,689,623,239,873,322,471,161,642,996,440,906,533,187,938,298,969,649,928,516,003,704,476,137,795,166,849,228,875

A Fibonacci A Day — Iterative version

Yesterday’s Fibonacci function was recursive, but we saw that it was extremely inefficient. Recursive functions typically break down a problem into smaller pieces, solve the smaller pieces, then combine the results into a final result, where “breaking down” a problem is done by the recursive function calling itself on a smaller version of itself. But, as we saw, it’s possible that the number of times this may happen has to be controlled, or gross inefficiencies may occur; this sometimes gives recursion a bad name.

The classic approach to writing the Fibonacci function is to write it iteratively, that is, to manually manage the building up of a solution. We note that if we keep two counters running, one for n-2 and one for n-1, we can collect the total easily enough.

def fibonacci_2(n: Int): Int = n match {
    case 0 ⇒ 0
    case 1 ⇒ 1
    case m ⇒ {
      var u = 0
      var v = 1
      var t = 1
      for (i ← 1 to m - 1) {
        t = u + v
        u = v
        v = t
      }
      t
    }
  }

As in the recursive version, we handle the base cases (0 and 1) directly, but from then on, we go from 2 to n (technically, from 1 to n-1) collecting a total in t, and using two accumulators, u and v for prior results.

The running time of the iterative version is O(n), that is, it will calculate the nth Fibonacci number in about n  steps. This is clear from direct inspection of the for loop.

This algorithm still has problems, as a piece of software, which we’ll discuss in the next installment. What happens, for example, when it calculates the Fibonacci number of 1000?